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Preface 
to the Second Edition 


The major change in this new edition is an increase in the number of 
challenging problems. This was requested by our readers. Since the 
actuarial examinations are an excellent source of challenging problems, 
we have added 109 sample exam problems to our exercise sections. 
(Detailed solutions can be found in the solutions manual). We thank the 
Society of Actuaries for permission to use these problems. 


We have added three new sections which cover the bivariate normal 
distribution, joint moment generating functions and the multinomial 
distribution. 


The authors would like to thank the second edition review team: 
Leonard A. Asimow, ASA, Ph.D. Robert Morris University, and 
Krupa S. Viswanathan, ASA, Ph.D., Temple University. 


Finally we would like to thank Gail Hall for her editorial work on the 
text and Marilyn Baleshiski for putting the book together. 


Matt Hassett Tempe, Arizona 
Don Stewart June, 2006 


Preface 


This text provides a first course in probability for students with a basic 
calculus background. It has been designed for students who are mostly 
interested in the applications of probability to risk management in vital 
modern areas such as insurance, finance, economics, and health sciences. 
The text has many features which are tailored for those students. 


Integration of applications and theory. Much of modern probability 
theory was developed for the analysis of important risk management 
problems. The student will see here that each concept or technique 
applies not only to the standard card or dice problems, but also to the 
analysis of insurance premiums, unemployment durations, and lives of 
mortgages. Applications are not separated as if they were an afterthought 
to the theory. The concept of pure premium for an insurance is 
introduced in a section on expected value because the pure premium is an 
expected value. 


Relevant applications. Applications will be taken from texts, published 
studies, and practical experience in actuarial science, finance, and 
economics. 


Development of key ideas through well-chosen examples. The text is 
not abstract, axiomatic or proof-oriented. Rather, it shows the student 
how to use probability theory to solve practical problems. The student 
will be introduced to Bayes’ Theorem with practical examples using 
trees and then shown the relevant formula. Expected values of 
distributions such as the gamma will be presented as useful facts, with 
proof left as an honors exercise. The student will focus on applying 
Bayes’ Theorem to disease testing or using the gamma distribution to 
model claim severity. 


Emphasis on intuitive understanding. Lack of formal proofs does not 
correspond to a lack of basic understanding. A well-chosen tree example 
shows most students what Bayes’ Theorem is really doing. A simple 
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expected value calculation for the exponential distribution or a 
polynomial density function demonstrates how expectations are found. 
The student should feel that he or she understands each concept. The 
words “beyond the scope of this text” will be avoided. 


Organization as a useful future reference. The text will present key 
formulas and concepts in clearly identified formula boxes and provide 
useful summary tables. For example, Appendix B will list all major 
distributions covered, along with the density function, mean, variance, 
and moment generating function of each. 


Use of technology. Modern technology now enables most students to 
solve practical problems which were once thought to be too involved. 
Thus students might once have integrated to calculate probabilities for an 
exponential distribution, but avoided the same problem for a gamma 
distribution with а =5 апа 3 =3. Today any student with a TI-83 


calculator or a personal computer version of MATLAB or Maple or 
Mathematica can calculate probabilities for the latter distribution. The 
text will contain boxed Technology Notes which show what can be done 
with modern calculating tools. These sections can be omitted by students 
or teachers who do not have access to this technology, or required for 
classes in which the technology is available. 


The practical and intuitive style of the text will make it useful for a 
number of different course objectives. 


A first course in probability for undergraduate mathematics majors. 
This course would enable sophomores to see the power and excitement 
of applied probability early in their programs, and provide an incentive to 
take further probability courses at higher levels. It would be especially 
useful for mathematics majors who are considering careers in actuarial 
science. 


An incentive course for talented business majors. The probability 
methods contained here are used on Wall Street, but they are not 
generally required of business students. There is a large untapped pool of 
mathematically-talented business students who could use this course 
experience as a base for a career as a “rocket scientist” in finance or as a 
mathematical economist. 
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An applied review course for theoretically-oriented students. Many 
mathematics majors in the United States take only an advanced, proof- 
oriented course in probability. This text can be used for a review of basic 
material in an understandable applied context. Such a review may be 
particularly helpful to mathematics students who decide late in their 
programs to focus on actuarial careers. 


The text has been class-tested twice at Arizona State University. Each 
class had a mixed group of actuarial students, mathematically- talented 
students from other areas such as economics, and interested mathematics 
majors. The material covered in one semester was Chapters 1-7, Sections 
8.1-8.5, Sections 9.1-9.4, Chapter 10 and Sections 11.1-11.4. The text is 
also suitable for a pre-calculus introduction to probability using Chapters 
1-6, or a two-semester course which covers the entire text. As always, 
the amount of material covered will depend heavily on the preferences of 
the instructor. 


The authors would like to thank the following members of a review team 
which worked carefully through two draft versions of this text: 


Sam Broverman, ASA, Ph.D., University of Toronto 
Sheldon Eisenberg, Ph.D., University of Hartford 
Bryan Hearsey, ASA, Ph.D., Lebanon Valley College 
Tom Herzog, ASA, Ph.D., Department of HUD 
Eugene Spiegel, Ph.D., University of Connecticut 


The review team made many valuable suggestions for improvement and 
corrected many errors. Any errors which remain are the responsibility of 
the authors. 


A second group of actuaries reviewed the text from the point of view of 
the actuary working in industry. We would like to thank William 
Gundberg, EA, Brian Januzik, ASA, and Andy Ribaudo, ASA, ACAS, 
FCAS, for valuable discussions on the relation of the text material to the 
day-to-day work of actuarial science. 


Special thanks are due to others. Dr. Neil Weiss of Arizona State 
University was always available for extremely helpful discussions 
concerning subtle technical issues. Dr. Michael Ratliff, ASA, of 
Northern Arizona University and Dr. Stuart Klugman, FSA, of Drake 
University read the entire text and made extremely helpful suggestions. 
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Thanks are also due to family members. Peggy Craig-Hassett provided 
warm and caring support throughout the entire process of creating this 
text. John, Thia, Breanna, JJ, Laini, Ben, Flint, Elle and Sabrina all 
enriched our lives, and also provided motivation for some of our 
examples. 


We would like to thank the ACTEX team which turned the idea for this 
text into a published work. Richard (Dick) London, FSA, first proposed 
the creation of this text to the authors and has provided editorial guidance 
through every step of the project. Denise Rosengrant did the daily work 
of turning our copy into an actual book. 


Finally a word of thanks for our students. Thank you for working with us 
through two semesters of class-testing, and thank you for your positive 
and cooperative spirit throughout. In the end, this text is not ours. It is 
yours because it will only achieve its goals if it works for you. 


May, 1999 | Matthew J. Hassett 
Tempe, Arizona Donald G. Stewart 
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Chapter 1 
Probability: A Tool for 
Risk Management 


1.1 Who Uses Probability? 


Probability theory is used for decision-making and risk management 
throughout modern civilization. Individuals use probability daily, 
whether or not they know the mathematical theory in this text. If a 
weather forecaster says that there is a 90% chance of rain, people carry 
umbrellas. The “90% chance of rain” is a statement of a probability. If a 
doctor tells a patient that a surgery has a 50% chance of an unpleasant 
side effect, the patient may want to look at other possible forms of 
treatment. If a famous stock market analyst states that there is a 90% 
chance of a severe drop in the stock market, people sell stocks. All of us 
make decisions about the weather, our finances and our health based on 
percentage statements which are really probability statements. 

Because probabilities are so important in our analysis of risk, 
professionals in a wide range of specialties study probability. Weather 
experts use probability to derive the percentages given in their forecasts. 
Medical researchers use probability theory in their study of the effective- 
ness of new drugs and surgeries. Wall Street firms hire mathematicians 
to apply probability in the study of investments. 

The insurance industry has a long tradition of using probability to 
manage its risks. If you want to buy car insurance, the price you will pay 
is based on the probability that you will have an accident. (This price is 
called a premium.) Life insurance becomes more expensive to purchase 
as you get older, because there is a higher probability that you will die. 
Group health insurance rates are based on the study of the probability 
that the group will have a certain level of claims. 
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The professionals who are responsible for the risk management 
and premium calculation in insurance companies are called actuaries. 
Actuaries take a long series of exams to be certified, and those exams 
emphasize mathematical probability because of its importance in 
insurance risk management. Probability is also used extensively in 
investment analysis, banking and corporate finance. To illustrate the 
application of probability in financial risk management, the next section 
gives a simplified example of how an insurance rate might be set using 
probabilities. 


1.2 Ар Example from Insurance 


In 2002 deaths from motor vehicle accidents occurred at a rate of 15.5 
per 100,000 population.! This is really a statement of a probability. A 
mathematician would say that the probability of death from a motor 
vehicle accident in the next year is 15.5/100,000 = .000155. 

Suppose that you decide to sell insurance and offer to pay $10,000 
if an insured person dies in a motor vehicle accident. (The money will 
go to a beneficiary who is named in the policy — perhaps a spouse, a 
close friend, or the actuarial program at your alma mater.) Your idea is 
to charge for the insurance and use the money obtained to pay off any 
claims that may occur. The tricky question is what to charge. 

You are optimistic and plan to sell 1,000,000 policies. If you 
believe the rate of 15.5 deaths from motor vehicles per 100,000 popula- 
tion still holds today, you would expect to have to pay 155 claims on 
your 1,000,000 policies. You will need 155(10,000) — $1,550,000 to 
pay those claims. Since you have 1,000,000 policyholders, you can 
charge each one a premium of $1.55. The charge is small, but 
1.55(1,000,000) — $1,550,000 gives you the money you will need to 
pay claims. 

This example is oversimplified. In the real insurance business you 
would earn interest on the premiums until the claims had to be paid. 
There are other more serious questions. Should you expect exactly 155 
claims from your 1,000,000 clients just because the national rate is 15.5 
claims in 100,000? Does the 2002 rate still apply? How can you pay 
expenses and make a profit in addition to paying claims? To answer 
these questions requires more knowledge of probability, and that is why 


l Statistical Abstract of the United States, 1996. Table No. 138, page 101. 
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this text does not end here. However, the oversimplified example makes 
a point. Knowledge of probability can be used to pool risks and provide 
useful goods like insurance. The remainder of this text will be devoted to 
teaching the basics of probability to students who wish to apply it in 
areas such as insurance, investments, finance and medicine. 


1.3 Probability and Statistics 


Statistics is a discipline which is based on probability but goes beyond 
probability to solve problems involving inferences based on sample data. 
For example, statisticians are responsible for the opinion polls which 
appear almost every day in the news. In such polls, a sample of a few 
thousand voters are asked to answer a question such as "Do you think 
the president is doing a good job?" The results of this sample survey are 
used to make an inference about the percentage of all voters who think 
that the president is doing a good job. The insurance problem in Section 
1.2 requires use of both probability and statistics. In this text, we will 
not attempt to teach statistical methods, but we will discuss a great deal 
of probability theory that is useful in statistics. It is best to defer a 
detailed discussion of the difference between probability and statistics 
until the student has studied both areas. It is useful to keep in mind that 
the disciplines of probability and statistics are related, but not exactly the 
same. 


1.4 Some History 


The origins of probability are a piece of everyday life; the subject was 
developed by people who wished to gamble intelligently. Although 
games of chance have been played for thousands of years, the 
development of a systematic mathematics of probability is more recent. 
Mathematical treatments of probability appear to have begun tn Italy in 
the latter part of the fifteenth century. A gambler’s manual which 
considered interesting problems in probability was written by Cardano 
(1500-1572). 

The major advance which led to the modern science of probability 
was the work of the French mathematician Blaise Pascal. In 1654 Pascal 
was given a gaming problem by the gambler Chevalier de Mere. The 
problem of points dealt with the division of proceeds of an interrupted 
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game. Pascal entered into correspondence with another French mathema- 
tician, Pierre de Fermat. The problem was solved in this correspondence, 
and this work is regarded as the starting point for modern probability. 

It is important to note that within twenty years of Pascal’s work, 
differential and integral calculus was being developed (independently) 
by Newton and Leibniz. The subsequent development of probability 
theory relied heavily on calculus. 

Probability theory developed at a steady pace during the 
eighteenth and nineteenth centuries. Contributions were made by leading 
scientists such as James Bernoulli, de Moivre, Legendre, Gauss and 
Poisson. Their contributions paved the way for very rapid growth in the 
twentieth century. 

Probability is of more recent origin than most of the mathematics 
covered in university courses. The computational methods of freshman 
calculus were known in the early 1700’s, but many of the probability 
distributions in this text were not studied until the 1900’s. The 
applications of probability in risk management are even more recent. For 
example, the foundations of modern portfolio theory were developed by 
Harry Markowitz [11] in 1952. The probabilistic study of mortgage 
prepayments was developed in the late 1980’s to study financial 
instruments which were first created in the 1970’s and early 1980’s. 

It would appear that actuaries have a longer tradition of use of 
probability; a text on life contingencies was published in 1771.2 
However, modern stochastic probability models did not seriously 
influence the actuarial profession until the 1970's, and actuarial 
researchers are now actively working with the new methods developed 
for use in modern finance. The July 2005 copy of the North American 
Actuarial Journal that is sitting on my desk has articles with titles like 
"Minimizing the Probability of Ruin When Claims Follow Brownian 
Motion With Drift." You can't read this article unless you know the 
basics contained in this book and some more advanced topics in 
probability. 

Probability is a young area, with most of its growth in the twen- 
tieth century. It is still developing rapidly and being applied in a wide 
range of practical areas. The history is of interest, but the future will be 
much more interesting. 


2 See the section on Historical Background in the 1999 Society of Actuaries Yearbook, 
page 5. 
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1.5 Computing Technology 


Modern computing technology has made some practical problems easier to 
solve. Many probability calculations involve rather difficult integrals; we 
can now compute these numerically using computers or modern 
calculators. Some problems are difficult to solve analytically but can be 
studied using computer simulation. In this text we will give examples of 
the use of technology in most sections. We will refer to results obtained 
using the TI-83 and TI BA II Plus Professional calculators and Microsoft® 
EXCEL. but will not attempt to teach the use of those tools. The 
technology sections will be clearly boxed off to separate them from the 
remainder of the text. Students who do not have the technological 
background should be aware that this will in no way restrict their 
understanding of the theory. However, the technology discussions should 
be valuable to the many students who already use modern calculators or 
computer packages. 


Chapter 2 
Counting for Probability 


2. What Is Probability? 


People who have never studied the subject understand the intuitive ideas 
behind the mathematical concept of probability. Teachers (including the 
authors of this text) usually begin a probability course by asking the 
students if they know the probability of a coin toss coming up heads. 
The obvious answer is 50% ог %, and most people give the obvious 
answer with very little hesitation. The reasoning behind this answer is 
simple. There are two possible outcomes of the coin toss, heads or tails. 
If the coin comes up heads, only one of the two possible outcomes has 
occurred. There is one chance in two of tossing a head. 

The simple reasoning here is based on an assumption — the coin 
must be fair, so that heads and tails are equally likely. If your gambler 
friend Fast Eddie invites you into a coin tossing game, you might suspect 
that he has altered the coin so that he can get your money. However, if 
you are willing to assume that the coin is fair, you count possibilities and 
come up with . 

Probabilities are evaluated by counting in a wide variety of 
situations. Gambling related problems involving dice and cards are 
typically solved using counting. For example, suppose you are rolling a 
single six-sided die whose sides bear the numbers 1, 2, 3, 4, 5 and 6. 
You wish to bet on the event that you will roll a number less than 5. The 
probability of this event is 4/6, since the outcomes 1,2,3 and 4 are less 
than 5 and there are six possible outcomes (assumed equally likely). The 
approach to probability used 1s summarized as follows: 
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Probability by Counting for Equally Likely Outcomes 


Number of outcomes in the event 
Total number of possible outcomes 


Probability of an event = 


Part of the work of this chapter will be to introduce a more precise 
mathematical framework for this counting definition. However, this is 
not the only way to view probability. There are some cases in which 
outcomes may not be equally likely. A die or a coin may be altered so 
that all outcomes are not equally likely. Suppose that you are tossing a 
coin and suspect that it is not fair. Then the probability of tossing a head 
cannot be determined by counting, but there is a simple way to estimate 
that probability — simply toss the coin a large number of times and 
count the number of heads. If you toss the coin 1000 times and observe 
650 heads, your best estimate of the probability of a head on one toss is 
650/1000 = .65. In this case you are using a relative frequency 
estimate of a probability. 


Relative Frequency Estimate of the Probability of an Event 


Number of times the event occurs in n trials 
n 


Probability of an event — 


We now have two ways of looking at probability, the counting 
approach for equally likely outcomes and the relative frequency 
approach. This raises an interesting question. If outcomes are equally 
likely, will both approaches lead to the same probability? For example, if 
you try to find the probability of tossing a head for a fair coin by tossing 
the coin a large number of times, should you expect to get a value of 75? 
The answer to this question is “not exactly, but for a very large number 
of tosses you are highly likely to get an answer close to 75." The more 
tosses, the more likely you are to be very close to %. We had our 
computer simulate different numbers of coin tosses, and came up with 
the following results. 
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Number of Heads | Probability Estimate 


1 25 
100 54 54 


4985 4985 


More will be said later in the text about the mathematical reason- 
ing underlying the relative frequency approach. Many texts identify a 
third approach to probability. That is the subjective approach to 
probability. Using this approach, you ask a well-informed person for his 
or her personal estimate of the probability of an event. For example, one 
of your authors worked on a business valuation problem which required 
knowledge of the probability that an individual would fail to make a 
monthly mortgage payment to a company. He went to an executive of 
the company and asked what percent of individuals failed to make the 
monthly payment in a typical month. The executive, relying on his 
experience, gave an estimate of 3%, and the valuation problem was 
solved using a subjective probability of .03. The executive’s subjective 
estimate of 3% was based on a personal recollection of relative 
frequencies he had seen in the past. 

In the remainder of this chapter we will work on building a more 
precise mathematical framework for probability. The counting approach 
will play a big part in this framework, but the reader should keep in mind 
that many of the probability numbers actually used in calculation may 
come from relative frequencies or subjective estimates. 


2.2 The Language of Probability; Sets, Sample Spaces 
and Events 


If probabilities are to be evaluated by counting outcomes of a probability 
experiment, it is essential that all outcomes be specified. A person who 
is not familiar with dice does not know that the possible outcomes for a 
single die are 1, 2, 3, 4, 5 and 6. That person cannot find the probability 
of rolling a 1 with a single die because the basic outcomes are unknown. 
In every well-defined probability experiment, all possible outcomes must 
be specified in some way. 

The language of set theory is very useful in the analysis of out- 
comes. Sets are covered in most modern mathematics courses, and the 
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reader is assumed to be familiar with some set theory. For the sake of 
completeness, we will review some of the basic ideas of set theory. A set 
is a collection of objects such as the numbers 1, 2,3, 4,5 and 6. These ob- 
jects are called the elements or members of the set. If the set is finite and 
small enough that we can easily list all of its elements, we can describe 
the set by listing all of its elements in braces. For the set above, 
S = {1,2,3,4, 5,6}. For large or infinite sets, the set-builder notation is 
helpful. For example, the set of all positive real numbers may be written 
as 


S = {zx | z is areal number and z > 0}. 


Often it is assumed that the numbers in question are real numbers, and 
the set above is written as 5 = (x | х= > 0}. 

We will review more set theory as needed in this chapter. The 
important use of set theory here is to provide a precise language for 
dealing with the outcomes in a probability experiment. The definition 
below uses the set concept to refer to all possible outcomes of a 
probability experiment. 


Definition 2.1 The sample space S for a probability experiment 
is the set of all possible outcomes of the experiment. 


Example 2.1 А single die is rolled and the number facing up 
recorded. The sample space is 5 = {1,2, 3,4,5,6}. 0 


Example 2.2 А coin is tossed and the side facing up is recorded. 
The sample space is S = {H,T}. оО 


Many interesting applications involve a simple two-element 
sample space. The following examples are of this type. 


Example 2.3 (Death of an insured) An insurance company is 
interested in the probability that an insured will die in the next year. The 
sample space is S = {death, survival}. o 


Example 2.4 (Failure of a part in a machine) A manufacturer is 
interested in the probability that a crucial part in a machine will fail in 
the next week. The sample space is S = {failure, survival}. L] 
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Example 2.5 (Default of a bond) Companies borrow money they 
need by issuing bonds. A bond is typically sold in $1000 units which 
have a fixed interest rate such as 8% per year for twenty years. When 
you buy a bond for $1000, you are actually loaning the company your 
$1000 in return for 8% interest per year. You are supposed to get your 
$1000 loan back in twenty years. If the company issuing the bonds has 
financial trouble, it may declare bankruptcy and default by failing to pay 
your money back. Investors who buy bonds wish to find the probability 
of default. The sample space is S = {default, no default}. m 


Example 2.6 (Prepayment of a mortgage) Homeowners usually 
buy their homes by getting a mortgage loan which is repaid by monthly 
payments. The homeowner usually has the right to pay off the mortgage 
loan early if that is desirable — because the homeowner decides to move 
and sell the house, because interest rates have gone down, or because 
someone has won the lottery. Lenders may lose or gain money when a 
loan is prepaid early, so they are interested in the probability of 
prepayment. If the lender is interested in whether the loan will prepay in 
the next month, the sample space is S = {prepayment, no prepayment}. 


The simple sample spaces above are all of the same type. Some- 
thing (a bond, a mortgage, a person, or a part) either continues or 
disappears. Despite this deceptive simplicity, the probabilities involved 
are of great importance. If a part in your airplane fails, you may become 
an insurance death — leading to the prepayment of your mortgage and a 
strain on your insurance company and its bonds. The probabilities are 
difficult and costly to estimate. Note also that the coin toss sample space 
UT, T') was the only one in which the two outcomes were equally likely. 
Luckily for most of us, insured individuals are more likely to live than 
die and bonds are more likely to succeed than to default. 

Not all sample spaces are so small or so simple. 


Example 2.7 An insurance company has sold 100 individual life 
insurance policies. When an insured individual dies, the beneficiary 
named in the policy will file a claim for the amount of the policy. You 
wish to observe the number of claims filed in the next year. The sample 
space consists of all integers from 0 to 100, so S = (0,1,2, ...,100}. П 
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Some of the previous examples may be looked at in slightly 
different ways that lead to different sample spaces. The sample space is 
determined by the question you are asking. 


Example 2.8 An insurance company sells life insurance to a 30- 
year-old female. The company is interested in the age of the insured 
when she eventually dies. If the company assumes that the insured will 
not live to 110, the sample space is 5 = {30,31,..., 109}. 0 


Example 2.9 А mortgage lender makes а 30-year monthly 
payment loan. The lender is interested in studying the month in which 
the mortgage is paid off. Since there are 360 months in 30 years, the 
sample space is S = {1,2,3,...,359, 360}. L1 


The sample space can also be infinite. 


Example 2.10 A stock is purchased for $100. You wish to 
observe the price it can be sold for in one year. Since stock prices are 
quoted in dollars and fractions of dollars, the stock could have any non- 
negative rational number as its future value. The sample space consists 
of all non-negative rational numbers, 5 = {x | x > 0 and = rational}. 
This does not imply that the price outcome of $1,000,000,000 is highly 
likely in one year — just that it is possible. Note that the price outcome 
of 0 is also possible. Stocks can become worthless. О 


The above examples show that the sample space for an experiment 
can be a small finite set, a large finite set, or an infinite set. 

In Section 2.1 we looked at the probability of events which were 
specified in words, such as “toss a head” or “roll a number less than 5." 
These events also need to be translated into clearly specified sets. For 
example, if a single die is rolled, the event “roll a number less than 5” 
consists of the outcomes in the set E = {1,2,3,4}. Note that the set E is 
a subset of the sample space S, since every element of E is an element 
of S. This leads to the following set-theoretical definition of an event. 


Definition 2.2 An event is a subset of the sample space S. 
This set-theoretic definition of an event often causes some un- 


necessary confusion since people think of an event as something 
described in words like “roll a number less than 5 on a roll of a single 
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die.” There is no conflict here. The definition above reminds you that 
you must take the event described in words and determine precisely what 
outcomes are in the event. Below we give a few examples of events 
which are stated in words and then translated into subsets of the sample 
space. 


Example 2.11 A coin is tossed. You wish to find the probability 
of the event “toss a head.” The sample space is 5 = {H,T}. The event 
is the subset E = {H}. m 


Example 2.12 An insurance company has sold 100 individual life 
policies. The company is interested in the probability that at most 5 of the 
policies have death benefit claims in the next year. The sample space is 
S = (0,1,2,...,100). The event E is the subset (0,1,2,3,4,5]. L1 


Example 2.13 You buy a stock for $100 and plan to sell it one 
year later. You are interested in the event E that you make a profit when 
the stock is sold. The sample space is S = {x | x > 0 and x rational}, 
the set of all possible future prices. The event E is the subset 
E = {х | x > 100 and z rational}, the set of all possible future prices 
which are greater than the $100 you paid. g 


Problems involving selections from a standard 52 card deck are 
common in beginning probability courses. Such problems reflect the origins 
of probability. To make listing simpler in card problems, we will adopt the 
following abbreviation system: 


A: Ace K: King Q: Queen J: Jack 
S: Spade H: Heart D: Diamond C: Club 


We can then describe individual cards by combining letters and 
numbers. For example KH will stand for the king of hearts and 2D for 
the 2 of diamonds. 


Example 2.14 A standard 52 card deck is shuffled and a card is 
picked at random. You are interested in the event that the card is a king. 
The sample space, 5 = (AS, KS,...,3C,2C}, consists of all 52 cards. 
The event E consists of the four kings, Е = (KS, KH, KD, KC). O 
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The examples of sample spaces and events given above are straight- 
forward. In many practical problems things become much more complex. 
The following sections introduce more set theory and some counting 
techniques which will help in analyzing more difficult problems. 


2.3 Compound Events; Set Notation 


When we refer to events in ordinary language, we often negate them (the 
card drawn is not a king) or combine them using the words “апа” or “ог” 
(the card drawn is a king or an ace). Set theory has a convenient notation 
for use with such compound events. 


2.3.1 Negation 
The event not E is written as —E. (This may also be written as Е.) 


Example 2.15 А single die is rolled, 5 = {1,2,3,4, 5,6}. The 
event E is the event of rolling a number less than 5, so E = {1,2,3,4}. 
E does not occur when a 5 or 6 is rolled. Thus ~E = {5,6}. L] 


Note that the event ~F is the set of all outcomes in the sample 
space which are not in the original event set E. The result of removing 
all elements of E from the original sample space S is referred to as 
S — E. Thus -E — S — E. This set is called the complement of E. 


Example 2.16 You buy a stock for $100 and wish to evaluate the 
probability of selling it for a higher price x in one year. The sample 
space is 5 = (r|r >0 and = rational}. The event of interest is 
E = {zx | x > 100 and z rational}. The negation ~E is the event that no 
profit is made on the sale, so ~E can be written as 


~E = {x|0 < т < 100 апа z rational} = S — Е. 
This can be portrayed graphically on a number line. 


~E: no profit E: profit 
0 100 
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Graphical depiction of events is very helpful. The most common 
tool for this is the Venn diagram, in which the sample space is 
portrayed as a rectangular region and the event is portrayed as a circular 
region inside the rectangle. The Venn diagram showing E and ~E is 
given in the following figure. 


2.3. The Compound Events А or B, A and B 


We will begin by returning to the familiar example of rolling a single 
die. Suppose that we have the opportunity to bet on two different events: 


À: an even number is rolled В: a number less than 5 is rolled 
А = {2,4,6} В = {1,2,3,4} 


If we bet that А or В occurs, we will win if any element of the two 
sets above is rolled. 


A or B= {1,2,3,4,6} 


In forming the set for A or B we have combined the sets A and B by 
listing all outcomes which appear in either A or B. The resulting set is 
called the union of A and B, and is written as A U B. It should be clear 
that for any two events A and B 


AorB=AUB. 
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For the single die roll above, we could also decide to bet on the 
event A and B. In that case, both the event A and the event B must 
occur on the single roll. This can happen only if an outcome occurs 
which is common to both events. 


Aand B = {2,4} 


In forming the set for A and B we have listed all outcomes which are in 
both sets simultaneously. This set is referred to as the intersection of A 
and B, and is written as А П B. For any two events A and В 


A and B = ANB. 


Example 2.17 Consider the insurance company which has written 
100 individual life insurance policies and is interested in the number of 
claims which will occur in the next year. The sample space is 
S = {0,1,2,...,100}. The company is interested in the following two 
events: 


A: there are at most 8 claims 
B: the number of claims is between 5 and 12 (inclusive) 


A and B are given by the sets 
A = {0,1,2,3,4,5, 6, 7, 8} 
and 
B= {5,6,7, 8,9, 10, 11,12}. 
Then the events A or B and A and B are given by 
Aor B= AUB = {0,1,2,3,4,5,6,7,8,9, 10, 11, 12} 
and 


Aand B = ANB = {5,6,7,8}. o 


The events A or B and A and B can also be represented using 
Venn diagrams, with overlapping circular regions representing A and B. 
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ANB 


2.3.3 New Sample Spaces from Old; Ordered Pair Outcomes 


In some situations the basic outcomes of interest are actually pairs of 
simpler outcomes. The following examples illustrate this. 


Example 2.18 (Insurance of a couple) Sometimes life insurance 
is written on a husband and wife. Suppose the insurer is interested in 
whether one or both members of the couple die in the next year. Then 
the insurance company must start by considering the following out- 
comes: 


Dy: death of the husband Sg: survival of the husband 
Dw: death of the wife Sw: survival of the wife 
Since the insurance company has written a policy insuring both husband 
and wife, the sample space of interest consists of pairs which show the 
status of both husband and wife. For example, the pair (Dy, Sw) 


describes the outcome in which the husband dies but the wife survives. 
The sample space is 


S = {(Dy, Sw), (Он, Dw), (5н, Sw), (Su, Dw)). 


In this sample space, events may be more complicated than they sound. 
Consider the following event: 


H: the husband dies in the next year 
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The death of the husband is not a single outcome. The insurance com- 
pany has insured two people, and has different obligations for each of 
the two outcomes in H. The death of the wife is similar. 


W: the wife dies in the next year 


W = {(Du, Dw), (52, Dw)} 
The events H or W and H and W are also sets of pairs. 
HUW = {(Du, Sw), (Da, Dw), (Su, Dw)} 
HOW = (Dz, Dw)} " 

Similar reasoning can be used in the study of the failure of two 
crucial parts in a machine or the prepayment of two mortgages. 
2.4 Set Identities 
2.4.1 The Distributive Laws for Sets 


The distributive law for real numbers is the familiar 
a(b + c) = ab + ac. 


Two similar distributive laws for set operations are the following: 


An(BUC)- (An В)О(Ап С) 


AU(BNC) =(AUB)N(AUC) 


These laws are helpful in dealing with compound events involving the 
connectives and and or. They tell us that 


A and (B or C) is equivalent to (A and B) or (A and C) 


and 
A or (B and C) is equivalent to (A or B) and (A or C). 
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The validity of these laws can be seen using Venn diagrams. This is 
pursued in the exercises. These identities are illustrated in the following 
example. 


Example 2.19 A financial services company is studying a large 
pool of individuals who are potential clients. The company offers to sell 
its clients stocks, bonds and life insurance. The events of interest are the 
following: 

S: the individual owns stocks 


B: the individual owns bonds 


I: the individual has life insurance coverage 


The distributive laws tell us that 
In(BuS)-(üinB)u(insS) 
and 
Iu(BnS)-(luB)n( us). 
The first identity states that 


insured and (owning bonds or stocks) 


is equivalent to 


(insured and owning bonds) or (insured and owning stocks). 
The second identity states that 


insured or (owning bonds and stocks) 


is equivalent to 


(insured or owning bonds) and (insured or owning stocks). m 
2.4. De Morgan's Laws 


Two other useful set identities are the following: 


~(AU B)=~AN~B 


~(AN B)=~AU~B 
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These laws state that 


not(A or B) is equivalent to (not A) and (not B) 
and 
not(A and B) is equivalent to (not A) or (not B). 


As before, verification using Venn diagrams is left for the exercises. The 
identity is seen more clearly through an example. 


Example 2.20 We return to the events S (ownership of stock) and 
B (ownership of bonds) in the previous example. De Morgan’s laws 
state that 
~(SUB)=~SN~B 
and 
~(SN В) = ~56 0 ~В. 


In words, the first identity states that if you don’t own stocks or bonds 
then you don’t own stocks and you don’t own bonds (and vice versa). 
The second identity states that if you don’t own both stocks and bonds, 
then you don’t own stocks or you don't own bonds (and vice versa). O 


De Morgan's laws and the distributive laws are worth remember- 
ing. They enable us to simplify events which are stated verbally or in set 


notation. They will be useful in the counting and probability problems 
which follow. 


2.5 Counting 
Since many (not all) probability problems will be solved by counting 


outcomes, this section will develop a number of counting principles 
which will prove useful in solving probability problems. 


2.5. Basic Rules 


We will first illustrate the basic counting rules by example and then state 
the general rules. In counting, we will use the convenient notation 


n(A) — the number of elements in the set (or event) A. 
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Example 2.21 A neighborhood association has 100 families on its 
membership list. 78 of the families have a credit card! and 50 of the 
families are currently paying off a car loan. 41 of the families have both 
a credit card and a car loan. A financial planner intends to call on one of 
the 100 families today. The planner’s sample space consists of the 100 
families in the association. The events of interest to the planner are the 
following: 


С: the family has a credit card L the family has a car loan 
We are given the following information: 
n(C) — 78 n(L) = 50 n(LnC)-41 


The planner is also interested in the answers to some other questions. 
For example, she would first like to know how many families do not 
have credit cards. Since there are 100 families and 78 have credit cards, 
the number of families that do not have credit cards is 100 — 78 — 22. 
This can be written using our counting notation as 


n(~C) = n(S) — n(C). O 


This reasoning clearly works in all situations, giving the following 
general rule for any finite sample space S and event A. 


n(~A) = n(S) — n(A) (2.5) 


Example 2.22 The planner in the previous example would also 
like to know how many of the 100 families had a credit card or a car 
loan. If she adds n(C) = 78 and n(L) = 50, the result of 128 is clearly 
too high. This happened because in the 128 figure each of the 41 
families with both a credit card and a car loan was counted twice. To 
reverse the double counting and get the correct answer, subtract 41 from 
128 to get the correct count of 87. This is written below in our counting 
notation. 


n(C UL) = n(C) + n(L) —n(C NL) =78+50-41=87 O 


! [n 2001, 72.7% of American families had credit cards. (Statistical Abstract of the 
United States, 2004-5, Table No. 1186.) 
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The reasoning in Example 2.22 also applies in general to any two 
events A and B in any finite sample space. 


n(AU B) = n(A) + n(B) - (An B) (2.6) 


Example 2.23 А single card is drawn at random from a well- 
shuffled deck. The events of interest are the following: 


H: the card drawn is a heart n(H) = 13 
К: the card is a king пк) = 4 
С: the card is a club n(C) — 13 


The compound event H (1 K occurs when the card drawn is both a heart 
and a king (1.e., the card is the king of hearts). Then n(H N К) = 1 and 


n(H U K) = n(H) + n(K) - п(Н NK) = 13 +41 = 16. 


The situation is somewhat simpler if we look at the events H and C. 
Since a single card is drawn, the event HMC can only occur if the 
single card drawn is both a heart and a club, which is impossible. There 
are no outcomes in Н MC, and n(H MC) = 0. Then 


n(H UC) = n(H) 4+ n(C) - n(H'C) = 13 +13 — 0 = 26. 
More simply, 
n(H UC) = n(H) + n(C). Г] 


The two events H апа С аге called mutually exclusive because 
they cannot occur together. The occurrence of H excludes the possibility 
of C and vice versa. There is a convenient way to write this in set 
notation. 


Definition 2.3 The empty set is the set which has no elements. It 
is denoted by the symbol 0. 


In the above example, we could write H N С = 0 to show that H 
and C are mutually exclusive. The same principle applies in general. 


Definition 2.4 Two events A and B are mutually exclusive if 
An B = №. 
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If A and B are mutually exclusive, then 


n(A U В) = n(A) + n(B). 


2.5.2 Using Venn Diagrams in Counting Problems 


Venn diagrams are helpful in visualizing all of the components of a 
counting problem. This is illustrated in the following example. 


Example 2.24 The following Venn diagram is labeled to com- 
pletely describe all of the components of Example 2.22. In that example 
the sample space consisted of 100 families. Recall that the events of 
interest were C (the family has a credit card) and L (the family has a car 
loan). We were given that n(C) = 78, n(L) = 50 and n(L nC) = 41. 
We found that n(L U C) = 87. The Venn diagram below shows all this 
and more. 


13 


Since n(C) = 78 and n(L NC) = 41, there are 78 families with credit 
cards and 41 families with both a credit card and a car loan. This leaves 
78 — 41 = 37 families with a credit card and no car loan. We write the 
number 37 in the part of the region for C which does not intersect L. 
Since n(L) — 50, there are only 9 families with a car loan and no credit 
card, so we write 9 in the appropriate region. The total number of 
families with either a credit card or a car loan is clearly given by 
37 -- 41 4- 9 = 87. Finally, since п(5) = 100, there are 100 — 87 = 13 
families with neither a credit card nor a car loan. m 
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The numbers on the previous page could all be derived using set 
identities and written in the following set theoretic terms: 


n(LNC) = 41 
n(~LNC) = 37 
n(LN~C) = 9 


n(~L N~C) = 13 


However, the Venn diagram gives the relevant numbers much more 
quickly than symbolic manipulation. Some common counting problems 
are especially suited to the Venn diagram method, as the following 
example shows. 


Example 2.25 A small college has 340 business majors. It is 
possible to have a double major in business and liberal arts. There are 
125 such double majors, and 315 students majoring in liberal arts but not 
in business. How many students are in liberal arts or business? 

Let B and L stand for majoring in business and liberal arts, 
respectively. The given information allows us to fill in the Venn diagram 
as follows. 


There аге 215 + 125 + 315 = 655 students in business or liberal arts. LJ] 


The Venn diagram can also be used in counting problems involv- 
ing three events, but requires the following slightly more complicated 
diagram. 
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Some problems of this type are given in the exercises. 


2.5.3 Trees 
A tree gives a graphical display of all possible cases in a problem. 


Example 2.26 A coin is tossed twice. The tree which gives all 
possible outcomes is shown below. We create one branch for each of the 
two outcomes on the first toss, and then attach a second set of branches 
to each of the first to show the outcomes on the second toss. The results 
of the two tosses along each set of branches are listed at the right of the 
diagram. О 
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A tree provides a simple display of all possible pairs of outcomes 
in an experiment if the number of outcomes is not unreasonably large. It 
would not be reasonable to attempt a tree for an experiment in which 
two numbers between 1 and 100 were picked at random, but it is 
reasonable to give a tree to show the outcomes for three successive coin 
tosses. Such a tree is shown below. 


Trees will be used extensively in this text as visual aids in problem 
solving. Many problems in risk analysis can be better understood when 
all possibilities are displayed in this fashion. The next example gives a 
tree for disease testing. 


Example 2.27 A test for the presence of a disease has two 
possible outcomes — positive or negative. A positive outcome indicates 
that the tested person may have the disease, and a negative outcome 
indicates that the tested person probably does not have the disease. Note 
that the test is not perfect. There may be some misleading results. The 
possibilities are shown in the tree below. We have the following 
outcomes of interest: 
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D: the person tested has the disease 
—D: the person tested does not have the disease 


Y: the test is positive 


N: the test is negative 


The outcome (~D, Y ) is referred to as a false positive result. The person 
tested does not have the disease, but nonetheless tests positive for it. The 
outcome (D, №) is a false negative result. M 


2.5.4 The Multiplication Principle for Counting 


The trees in the prior section illustrate a fundamental counting principle. 
In the case of two coin tosses, there were two choices for the outcome at 
the end of the first branch, and for each outcome on the first toss there 
were two more possibilities for the second branch. This led to a total of 
2х 2 = 4 outcomes. This reasoning is a particular instance of a very 
useful general law. 
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The Multiplication Principle for Counting 


Suppose that the outcomes of an experiment consist of a 
combination of two separate tasks or actions. Suppose there are 
n possibilities for the first task, and that for each of these n 
possibilities there are k possible ways to perform the second 
task. Then there are nk possible outcomes for the experiment. 


Example 2.28 A coin is tossed twice. The first toss has n = 2 
possible outcomes and the second toss has k = 2 possible outcomes. The 
experiment (two tosses) has nk = 2-2 = 4 possible outcomes. [1 


Example 2.29 An employee of a southwestern state can choose 
one of three group life insurance plans and one of five group health 
insurance plans. The total number of ways she can choose her complete 
life and health insurance package is 3 · 5 = 15. О 


The validity of this counting principle can be seen by considering 
a tree for the combination of tasks. There are n possibilities for the first 
branch, and for each first branch there are k possibilities for the second 
branch. This will lead to a total of nk combined branches. Another way 
to present the rule schematically is the following: 


The multiplication principle also applies to combined experiments 
consisting of more than two tasks. On page 26 we gave a tree to show all 
possible outcomes of tossing a coin three times. There were 2-2-2 = 8 
total outcomes for the combined experiment. This illustrates the general 
multiplication principle for counting. 

Suppose that the outcomes of an experiment consist of a combina- 
tion of k separate tasks or actions. If task i can be performed in n; ways 
for each combined outcome of the remaining tasks for = 1,..., k, then 
the total number of outcomes for the experiment is т х na X ... x Ng. 
Schematically, we have the following: 


[ Task 1 | Task2 | -..] Task | Total outcomes | 
| бе [т ee om [mxmx:xn | 
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Example 2.30 A certain mathematician owns 8 pairs of socks, 4 
pairs of pants, and 10 shirts. The number of different ways he can get 
dressed is 8 - 4- 10 = 320. (It is important to note that this solution only 
applies if the mathematician will wear anything with anything else, 
which is a matter of concern to his wife.) 


The number of total possibilities in an everyday setting can be 
surprisingly large. 


Example 2.31 A restaurant has 9 appetizers, 12 main courses, and 
6 desserts. Each main course comes with a salad, and there are 6 choices 


. for salad dressing. The number of different meals consisting of an 


appetizer, a salad with dressing, a main course, and a dessert is therefore 
9.6. 12.6 = 3888. o 


2.5.5 Permutations 


In many practical situations it is necessary to arrange objects in order. If 
you were considering buying one of four different cars, you would be 
interested in a 1,2,3,4 ranking which ordered them from best to worst. 
If you are scheduling a meeting in which there are 5 different speakers, 
you must create a program which gives the order in which they speak. 


Definition 2.5 A permutation of n objects is an ordered arrange- 
ment of those objects. 


The number of permutations of n objects can be found using the 
counting principal. 


Example 2.32 The number of ways that four different cars can be 
ranked is shown schematically below. 


The successive tasks here are to choose Ranks 1, 2, 3 and 4. At the be- 
ginning there are 4 choices for Rank 1. After the first car is chosen, there 
are 3 cars left for Rank 2. After 2 cars have been chosen, there are only 2 
cars left for Rank 3. Finally, there is only one car left for Rank 4. 

о 
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The same reasoning works for the problem of arranging 5 speakers 
in order. The total number of possibilities is 5-4-3-2-1= 120. To 
handle problems like this, it is convenient to use factorial notation. 


n! = n(n—1)(n—2)---1 


The notation n! is read as “n factorial." The reasoning used in the 
previous examples leads to another counting principle. 


First Counting Principle for Permutations 


The number of permutations of n objects is n!. 


Note: 0! is defined to be 1, the number of ways to arrange 0 objects. 


Example 2.33 The manager of a youth baseball team has chosen 
nine players to start a game. The total number of batting orders that is 
possible is the number of ways to arrange nine players in order, namely 
91=9.8.7.6.5.4.3.2.1 = 362,880. (When the authors coached 
youth baseball, another coach stated that he had looked at all possible 
batting orders and had picked the best one. Sure.) L1 


The previous example shows that the number of permutations of n 
objects can be surprisingly large. Factorials grow rapidly as n increases, 
as shown in the following table. 


362,880 
39,916,800 
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The number 52! has 68 digits and is too long to bother with presenting 
here. This may interest card players, since 52! is the number of ways that 
a standard card deck can be put in order (shuffled). 

Some problems involve arranging only r of the n objects in order. 


Example 2.34 Ten students are finalists in a scholarship competi- 
tion. The top three students will receive scholarships for $1000, $500 
and $200. The number of ways the scholarships can be awarded is found 
as follows: 


Total ways to rank 
8 10.9.8 = 720 


This is similar to Example 2.32. Any one of the 10 students can win the 
$1000 scholarship. Once that is awarded, there are only 9 left for the 
$500. Finally, there are only 8 left for the $200. Note that we could also 
write 


_ 10! _ 10! 
da emer Shae оок. = 


Example 2.34 is referred to as a problem of permuting 10 objects 
taken 3 at a time. 


Definition 2.6 A permutation of n objects taken r at a time is an 
ordered arrangement of r of the original n objects, where r < n. 


The reasoning used in the previous example can be used to derive 
a counting principle for permutations. 


Second Counting Principle for Permutations 


The number of permutations of n objects taken r at a time is 


denoted by P(n, т). 


P(n,r) = n(n — 1)---(%—т +1) = {у (2.8) 


Special Cases: P(n,n) = n! Р(п,0) = 1 
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Technology Note 


Calculation of P(n,r) is simple using modern calculators. Inex- 
pensive scientific calculators typically have a factorial function key. 
This makes the computation of P(10,3) above simple — find 10! and 
divide it by 3!. 

More powerful calculators find quantities like P(10,3) directly. 
For example: 

(a) Оп the TI-83 calculator, in the MATH menu under PRB, 
you will find the operator пРт. If you key in 10 nPr 3, you 
will get the answer 720 directly. 

(b) Оп the TI BA II Plus Professional calculator, nPr is avail- 
ble as a 2 ND function on the [-] key. 


Because modern calculators make these computations so easy, we will 
not avoid realistic problems in which answers involve large factorials.? 


Many computer packages will compute factorials. The spreadsheet 
programs that are widely used on personal computers in business also 
have factorial functions. For example Microsoft? EXCEL has a function 
FACIT(cell) which calculates the factorial of the number in the cell. 


Example 2.35 Suppose a fourth scholarship for $100 is made 
available to the 10 students in Example 2.34. The number of ways the 
four scholarships can be awarded is 


P(10,4) = 10.9. 8.7 = 1 = 5040. п 
In some problems involving ordered arrangements the fact of 
ordering is not so obvious. 


Example 2.36 The manager of a consulting firm office has 8 
analysts available for job assignments. He must pick 3 analysts and 
assign one to a job in Bartlesville, Oklahoma, one to a job in Pensacola, 
Florida, and one to a job in Houston, Texas? In how many ways can he 
do this? 


2 On most calculators factorials quickly become too large for the display mode, and 
factorials like 14! are given in scientific notation with some digits missing. 

3 This is real. Ben Wilson, a consultant and son-in-law of one of the authors, was recently 
sent to all three of those cities. 
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Solution This is a permutation problem, but it is not quite so 
obvious that order is involved. There is no implication that the highest 
ranked analyst will be sent to Bartlesville. However, order is implicit in 
making assignment lists like this one. The manager must fill out the 


following form: 
[Cy [Алары 


City 
There is no implication that the order of the cities ranks them in any 
way, but the list must be filled out with a first choice on the first line, a 
second choice on the second line and a final choice on the third line. 
This imposes an order on the problem. The total number of ways the job 
assignment can be done is 


Р(8, 3) = 8-7-6 = 8 = 336. o 


2.5.6 Combinations 


In every permutation problem an ordering was stated or implied. In some 
problems, order is not an issue. 


Example 2.37 A city council has 8 members. The council has 
decided to set up a committee of three members to study a zoning issue. 
In how many ways can the committee be selected? 

Solution This problem does not involve order, since members of a 
committee are not identified by order of selection. The committee 
consisting of Smith, Jones and London is the same as the committee 
consisting of London, Smith and Jones. However, there is a way to look 
at the problem using what we already know about ordered arrangements. 
If we wanted to count all the ordered selections of 3 individuals from 8 
council members, the answer would be 


P(8,3) = 336 = number of ordered selections. 


In the 336 ordered selections, each group of 3 individuals is counted 
3! = 6 times. (Remember that 3 individuals can be ordered in 3! ways.) 
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Thus the number of unordered selections of 3 individuals is 


6 3! 


336 _ P3) _ 56 


In the language of sets, we would say that the number of possible three- 
element subsets of the set of 8 council members is 56, since a subset is a 
selection of elements in which order is irrelevant. Li 


Definition 2.7 A combination of n objects taken r at a time is an 
r-element subset of the original n elements (or, equivalently, an unor- 
dered selection of r of the original n elements). 


The number of combinations of n elements taken r at a time is 
denoted by C(n,r) or (7). The notation (7) has traditionally been 
more widely used, but the C(n,r) notation is more commonly used in 
mathematical calculators and computer programs — probably because it 
can be typed on a single line. We will use both notations in this text. 


Example 2.37 above used the reasoning that since any 3-element 
subset can be ordered in 3! ways, then 


свз) = ($) = 262). 


Using Equation (2.8) for P(8, 3), we see that P(8, 3) = S : and thus 


С(8,3) = 4i = 8720 = 56. 


This reasoning applies to the r-element subsets of any n-element 


set, leading to the following general counting principle: 


Counting Principle for Combinations 


г т) _ n! |. n(n—1y-(n-—rd1) 
тп т) | r! 


(ж ) = C(n,r) = 


(2.9) 


Special Cases: C(n,n) = C(n,0) = 1 
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Technology Note 


Any calculator with a factorial function can be used to find C(n,r). 
The TI-83 and TI-BA II Plus Professional calculators both have nC'r 
functions which calculate C(n,r) directly. Microsoft? EXCEL has a 
COMBIN function to evaluate C (n, т). 


Example 2.38 A company has ten management trainees. The 
company will test a new training method on four of the ten trainees. In 
how many ways can four trainees be selected for testing? 

Solution 


C(10,4) — ЛО = = 10292827 = 210 o 


Example 2.39 It has become a tradition for authors of probability 
and statistics texts to include a discussion of their own state lottery. In 
the Arizona lottery, the player buys a ticket with six distinct numbers on 
it. The numbers are chosen from the numbers 1,2,...,42. What is the 
total number of possible combinations of 6 numbers chosen from 42 
numbers? 

Solution 


C(42, 6) = dip = HALE зо ав = 5,245,786 o 


2.5.7 Combined Problems 


Many counting problems involve combined use of the multiplication 
principle, permutations, and combinations. 


Example 2.40 A company has 20 male employees and 30 female 
employees. A grievance committee is to be established. The committee 
will have two male members and three female members. In how many 
ways can the committee be chosen? 

Solution We will use the multiplication principle. We have the 
following two tasks: 


Task 1: choose 2 males from 20 
Task 2: choose 3 females from 30 
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The number of ways to choose the entire committee is 


(Number of ways for Task 1) x (Number of ways for Task 2) 
= (2) (2) = 190. 4060 = 71,0. O 


| Example 2.41 A club has 40 members. Three of the members аге 
running for office and will be elected president, vice-president and 
secretary-treasurer based on the total number of votes received. An 
advisory committee with 4 members will be selected from the 37 mem- 
bers who are not running for office. In how many ways can the club 
- select its officers and advisory committee? 
Solution In this problem, Task 1 is to rank the three candidates 
for office and Task 2 is select a committee of 4 from 37 members. The 
final answer is 


31( 27) = 6 - 66,045 = 396,270. " 


2.5.8 Partitions 


Partitioning refers to the process of breaking a large group into separate 
smaller groups. The combination problems previously discussed are 
simple examples of partitioning problems. 


Example 2.42 А company has 20 new employees to train. The 
company will select 6 employees to test a new computer-based training 
package. (The remaining 14 employees will get a classroom training 
course.) In how many ways can the company select the 6 employees for 
the new method? 

Solution The company can select 6 employees from 20 in 
C (20,6) = 38,760 ways. Each possible selection of 6 employees results 
in a partition of the 20 employees into two groups — 6 employees for 
the computer-based training and 14 for the classroom. (We would get an 
identical answer if we solved the problem by selection of the 14 
employees for classroom training.) The number of ways to partition the 
group of 20 into two groups of 6 and 14 is 


(2) e (29) = dn — 38,760. L1 
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A similar pattern develops when the partitioning involves more 
than two groups. 


Example 2.43 The company in the last example has now decided 
to test televised classes in addition to computer-based training. In how 
many ways can the group of 20 employees be divided into 3 groups with 
6 chosen for computer-based training, 4 for televised classes, and 10 for 
traditional classes? 

Solution The partitioning requires the following two tasks: 


Task 1: select 6 of 20 for computer-based training 
Task 2: select 4 of the remaining 14 for the televised class 


Once Task 2 is completed, only 10 employees will remain and they will 
take the traditional class. Thus the total number of ways to partition the 
employees is 


20\/14) _ 20! 14! _ 2 _ 
( 6 Y 4 ) = Ат" 41101 = 6141101 = 25778760. a 


The number of partitions of 20 objects into three groups of size 6, 
4 and 10 is denoted by 
20 
(6, 4, 10: 


Example 2.43 showed that (s P 10) = anor and, similarly, Exam- 


20! 


ple 2.42 showed that ( 2 4) = т: 


The method of Example 2.43 can be used to show that this pattern 
always holds for the total number of partitions. 


Counting Principle for Partitions 


The number of partitions of n objects into k distinct groups of 
sizes nj, n2, ..., ny 15 given by 


(2.10) 


(Gong eas 
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Example 2.44 An insurance company has 15 new employees. The 
company needs to assign 4 to underwriting, 6 to marketing, 3 to 
accounting, and 2 to investments. In how many different ways can this 
be done? (Assume that any of the 15 can be assigned to any department.) 

Solution 

1 


5 15! 
(4, 6.3; 2) = genn — 6,306,300 п 


Many counting problems сап be solved using partitions if they аге 
looked at in the right way. Exercise 2-39, finding the number of ways to 
rearrange the letters in the word MISSISSIPPI, is a classical problem 
which can be done using partitions. 


2.5.9 Some Useful Identities 
In Example 2.42 we noted that 

204. (20ү 20! _ 

(0) = (13) = dta = 38.700. 
This is a special case of the general identity C(n, k) = C(n,n—k), or 
пу n =. n! 
(k) 7 aos kJ) ~ k(n—k) 
In Exercise 2-46, the reader is asked to show that the total number 

of subsets of an n-element set is 2”. Since C (n, k) represents the number 


of k-element subsets of an n-element set, we can also find the total 
number of subsets of an n-element set by adding up all of the C'(n, k). 


2 (0) HC) EE) tU) 


For example, 
OHONO MO een 


In Exercise 2-45, the reader is asked to use counting principles to 
derive the familiar Binomial Theorem 


(z +y) = (8) + (В) "у (2) y +- 


a E e (yt 
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This is useful for expansions such as 
(ж + у) = (d) + (2) 22у + (3) 220° + (3) =з + (4) 


= qf + 413у + 6r? y? + Ату? + у. 


2.6 Ехегсіѕеѕ 


2.2 The Language of Probability; Sets, Sample Spaces 


and Events 

2-1. From a standard deck of cards a single card is drawn. Let E be 
the event that the card is a red face card. List the outcomes in the 
event Е. 

2-2. Ап insurance company insures buildings against loss due to fire. 


(a) What is the sample space of the amount of loss? 

(b) What is the event that the amount of loss is strictly be- 
tween $1,000 and $1,000,000 (i.e., the amount z is in the 
open interval (1,000, 1,000,000)? 


2-3. An urn contains balls numbered from 1 to 25. A ball is selected 
and its number noted. 
(a)  Whatis the sample space for this experiment? 
(b) If E is the event that the number is odd, what are the 
outcomes in E? 


2-4. Ап experiment consists of rolling a pair of fair dice, one red and 
one green. An outcome is an ordered pair (r, g), where r is the 
number on the red die and g is the number on the green die. List 
all outcomes of this experiment. 


2-5. Two dice are rolled. How many outcomes have a sum of (a) 7; 
(b) 8; (c) 11; (d) 7 or 11? 


2-6. | Suppose a family has 3 children. List all possible outcomes for 
the sequence of births by sex in this family. 
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2-10. 
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2-12. 
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2-13. 
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Compound Events; Set Notation 


Let S be the sample space for drawing a ball from an urn 
containing balls numbered from 1 to 25, and E be the event the 
number is odd. What are the outcomes in ~E? 


In the sample space for drawing a card from a standard deck, let 
A be the event the card is a face card and B be the event the 
card is a club. List all the outcomes in A 1 B. 


Consider the insurance company that insures against loss due to 
fire. Let A be the event the loss is strictly between $1,000 and 
$100,000, and B be the event the loss is strictly between 
$50,000 and $500,000. What are the events in AU B and 
AN В? 


An experiment consists of tossing a coin and then rolling a die. 
An outcome is an ordered pair, such as (H,3). Let A be the 
event the coin shows heads and B be the event the number on 
the die is greater than 2. What is A N B? 


In the experiment of tossing two dice, let E be the event the sum 
of the dice is 6 and F be the event both dice show the same 
number. List the outcomes in the events E О F and EN F. 


In the sample space for the family with three children in Exer- 
cise 2-6, let E be the event that the oldest child is a girl and F 
the event that the middle child is a boy. List the outcomes in Ё, 
F,EUFand ENF. 


Set Identities 


Verify the two distributive laws by drawing the appropriate 
Venn diagrams. 


Verify De Morgan's laws by drawing the appropriate Venn 
diagrams. 
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2-15. 


2.5 


2-16. 


2-20. 


2-21. 


Let M be the set of students in a large university who are taking 
a mathematics class and E be the set taking an economics class. 


(a) Give a verbal statement of the identity 
-(M U E) = -M n-E. 


(b) Give a verbal statement of the identity 
-(M n E) --MU-E. 


Counting 


An insurance agent sells two types of insurance, life and health. 
Of his clients, 38 have life policies, 29 have health policies and 
21 have both. How many clients does he have? 


A company has 134 employees. There are 84 who have been 
with the company more than 10 years and 65 of those are college 
graduates. There are 23 who do not have college degrees and 
have been with the company less than 10 years. How many 
employees are college graduates? 


A stockbroker has 94 clients who own either stocks or bonds. If 67 
own stocks and 52 own bonds, how many own both stocks and 
bonds? 


In a survey of 185 university students, 91 were taking a history 
course, 75 were taking a biology course, and 37 were taking both. 
How many were taking a course in exactly one of these subjects? 


A broker deals in stocks, bonds and commodities. In reviewing his 
clients he finds that 29 own stocks, 27 own bonds, 19 own 
commodities, 11 own stocks and bonds, 9 own stocks and 
commodities, 8 own bonds and commodities, 3 own all three, and 
11 have no current investments. How many clients does he have? 


An insurance agent sells life, health and auto insurance. During the 
year she met with 85 potential clients. Of these, 42 purchased life 
insurance, 40 health insurance, 24 auto insurance, 14 both life and 
health, 9 both life and auto, 11 both health and auto, and 2 
purchased all three. How many of these potential clients purchased 
(a) no policies; (b) only health policies; (c) exactly one type of 
insurance; (d) life or health but not auto insurance? 
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2-24. 


2-25. 


2-26. 


2-27. 


2-28. 
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If an experiment consists of tossing a coin and then rolling a die, 
how many outcomes are possible? 


In purchasing a car, a woman has the choice of 4 body styles, 15 
color combinations, and 6 accessory packages. In how many 
ways can she select her car? 


A student needs a course in each of history, mathematics, 
foreign languages and economics to graduate. In looking at the 
class schedule he sees he can choose from 7 history classes, 8 
mathematics classes, 4 foreign language classes and 7 economics 
classes. In how many ways can he select the four classes he 
needs to graduate? 


An experiment has two stages. The first stage consists of drawing a 
card from a standard deck. If the card is red, the second stage 
consists of tossing a coin. If the card is black, the second stage 
consists of rolling a die. How many outcomes are possible? 


Let X be the n-element set {x,,22,...,2n}. Show that the 
number of subsets of X, including X and 0, is 2". (Hint: For 
each subset A of X, define the sequence (a,,a5,...,a4) such 
that a; = 1 if x; € A and 0 otherwise. Then count the number of 
sequences). 


An arrangement of 4 letters from the set (A, B, C, D E, F} is 
called a (four-letter) word from that set. How many four-letter 
words are possible if repetitions are allowed? How many four- 
letter words are possible if repetitions are not allowed? 


Suppose any 7-digit number whose first digit is neither 0 nor 1 
can be used as a telephone number. How many phone numbers 
are possible if repetitions are allowed? How many are possible 
if repetitions are not allowed? 


A row contains 12 chairs. In how many ways can 7 people be 
seated in these chairs? 


At the beginning of the basketball season a sportswriter is asked 
to rank the top 4 teams of the 10 teams in the PAC-10 confer- 
ence. How many different rankings are possible? 
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2-31, 


2-32. 


2-33. 


2-34. 


2-35. 


2-36. 


2-37. 


2-38. 


2-40. 


A club with 30 members has three officers: president, secretary 
and treasurer. In how many ways can these offices be filled? 


The speaker's table at a banquet has 10 chairs in a row. Of the 
ten people to be seated at the table, 4 are left-handed and 6 are 
right-handed. To avoid elbowing each other while eating, the 
left-handed people are seated in the 4 chairs on the left. In how 
many ways can these 10 people be seated? 


Eight people are to be seated in a row of eight chairs. In how 
many ways can these people be seated if two of them insist on 
sitting next to each other? 


A club with 30 members wants to have a 3-person governing 
board. In how many ways can this board be chosen? (Compare 
with Exercise 2-31.) 


How many 5-card (poker) hands are possible from a deck of 52 
cards? 


How many of those poker hands consist of (a) all hearts; (b) all 
cards in the same suit; (c) 2 aces, 2 kings and 1 jack? 


In a class of 15 boys and 13 girls, the teacher wants a cast of 4 
boys and 5 girls for a play. In how many ways can she select the 
cast? 


The Power Ball lottery uses two sets of balls, a set of white balls 
numbered 1 to 55 and a set of red balls numbered 1 to 42. To 
play, you select 5 of the white balls and 1 red ball. In how many 
ways can you make your selection? 


How many different ways are there to arrange the letters in the 
word MISSISSIPPI? 


An insurance company has offices in New York, Chicago and 
Los Angeles. It hires 12 new actuaries and sends 5 to New York, 
3 to Chicago, and 4 to Los Angeles. In how many ways can this 
be done? 
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A company has 9 analysts: It has a major project which has been 
divided into 3 subprojects, and it assigns 3 analysts to each task. 
In how ways can this be done? 


Suppose that, in Exercise 2-41, the company divides the 9 
analysts into 3 teams of 3 each, and each team works on the 
whole project. In how many ways can this be done? 


Expand (2s — t}. 


In the expansion of (2u — 3v)5, what is the coefficient of the 
term involving и??? 


Prove the Binomial Theorem. (Hint: How many ways can you 
get the term x”~*y* from the product of n factors, each of which 
is (x + y)?) 


Using the Binomial Theorem, give an alternate proof that the 
number of subsets of an n-element set is 2”. 


Sample Actuarial Examination Problem 


An auto insurance company has 10,000 policyholders. Each 
policyholder is classified as 


(i) young or old; 
(ii) male or female; and 
(iii) married or single. 


Of these policyholders, 3000 are young, 4600 are male, and 
7000 are married. The policyholders can also be classified as 
1320 young males, 3010 married males, and 1400young married 
persons. Finally, 600 of the policyholders are young married 
males. 


How many of the company’s policyholders are young, female, 
and single? 


Chapter 3 
Elements of Probability 


3.1 Probability by Counting for Equally Likely 
Outcomes 


3.1.1 Definition of Probability for Equally Likely Outcomes 


The lengthy Chapter 2 on counting may cause the reader to forget that 
our goal is to find probabilities. In Section 2.1 we stated an intuitively 
appealing definition of probability for situations in which outcomes were 
equally likely. 


Probability by Counting for Equally Likely Outcomes 


Number of outcomes in the event 


коро IEEE. Total number of possible outcomes 


Chapter 2 gave us methods to count numbers of outcomes. The 
discussion of sets gave us a precise language for discussing collections 
of outcomes. Using the language and notation that have been developed, 
we can now give a more precise definition of probability. 


Definition 3.1 Let E be an event from a sample space 5 in which 
all outcomes are equally likely. The probability of E, denoted P(E), is 
defined by 
n(E) 
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Example 3.1 A company has 200 employees. 50 of these employ- 
ees are smokers. One employee is selected at random. What is the 
probability that the selected employee is a smoker (Sm)? 

Solution 


Pismyss P = 39 25 m 


Example 3.2 A standard 52 card deck is shuffled and one card is 
picked at random. What is the probability that the card is (a) a king (К); 
(b) a club (C); (c) a king and a club; (d) a heart and a club? 

Solution 


а) РК) = = $= 


ES 
13 

_ (С) 13 1 
 Po-"O-B-1 


(c) Тһе only card in the event K MC is the king of clubs. Then 


n(K П (CQ C) _ 1 
P(KnCc)- TW ТЫЙ 
(d) A single card cannot be both a heart and a club, we have 
Hnc 
n(H NC) = 0. Then PH n C) = YA = = 0 - o. 


1 


Part (а) of Example 3.2 illustrates an important point. It is 
impossible for a single card to be both a heart and a club. /f an event is 
impossible, n(E) will be 0 and P(E) will also be 0. 


3.1.2 Probability Rules for Compound Events 


Some very useful probability rules can be derived from the counting 
rules in Section 2.5.1. The playing card experiment in Example 3.2 will 
provide simple illustrations of these rules. A standard deck is shuffled 
and a single card is chosen. We are interested in the following events: 


Н: the card drawn is a heart n(H) = 13 P(A) = 1/4 
K: the card is a king пк) = 4 P(K) = 1/13 
С: the card is a club n(C) = 13 P(C) = 1/4 
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Example 3.3 Find P(-C). 
Solution 


PEC) = BD = 52513 =1-] = 1- PC) o 


The general rule for P(~E) can be derived from Equation (2.5), 
n(~E) = n(S) — n(E). Dividing by n(S), we obtain 


n(-E)  n(S) nmn(E) 


PCE) = Ay = n(S)- n(S) = 


1- P(E). 


This gives a useful identity for P(—E). 


Negation Rule 


P(-E)-1- P(E) 
Another useful rule comes from Equation (2.6), which states 
n(AU В) = n(A) + n(B) — n(An B). 
Dividing by n(S) here, we obtain 


_ АОВ)  n(A) (В) (Ап В) 
PAND = n(S) “шу ^a) um 


= P(A) + P(B) - Р(АП В). 


This gives a useful identity for P(A U B). 


Disjunction Rule 


P(A U B) = P(A) + P(B) - P(An B) 


Example 3.4 A single card is drawn at random from a deck. Use 
Equation (3.2) to find (a) P(K UC); (b) P(H UC). 

Solution 

(a) P(K UC) = Р(К) + P(C)- PUK NC) 


d а 116 
TAA 292. 15252 
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Note that this problem could also have been solved directly 
by counting n(K UC) and dividing by 52. This should be 
obvious, since the rule used was based on counting. We will 
see later that Equation (3.2) still holds in situations where 
counting does not apply. 

(b P(HUC) = P(A)+ P(C)- P(ANC) 


_~13,13_ 0 _ 26 
= $5 t53- 33 = 55 H 


Part (b) of Example 3.4 illustrates a simple situation which occurs 
often. P(H NC) = 0, so that P(H UC) = P(H) + P(C). Events like 
H and C are called mutually exclusive because the occurrence of one 
excludes the occurrence of the other. Mutually exclusive events were 
defined in Definition 2.4, which is repeated here for reinforcement. 


Definition 2.4 Two events A and B are mutually exclusive if 
An B = 0. 


For mutually exclusive events, P(A N B) = 0, and the following 
addition rule holds. 


Addition Rule for Mutually Exclusive Events 


If ANB = 0, then P(AU B) = P(A) + P(B). 


Some care is needed in identifying mutually exclusive events. For 
example, if a single card is drawn from a deck, hearts and clubs are 
mutually exclusive. In some later problems we will look at the experi- 
ment of drawing two cards from a deck. In this case a first draw of a 
heart does not exclude a second draw of a club. 

The rules developed here can be used in a wide range of applica- 
tions. 


Example 3.5 In Examples 2.21 and 2.22 we looked at a financial 
planner who intended to call on one family from a neighborhood 
association. In that association there were 100 families. 78 families had a 
credit card (C), 50 of the families were paying off a car loan (L), and 41 
of the families had both a credit card and a car loan. The planner is going 
to pick one family at random. What is the probability that the family has 
a credit card or a car loan? 
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Solution 
P(LU C) = P(L)+ P(C) - P(LnC) 


— 50, 78 _ 41 _ 
= 100 + 100 - 100 = 87 0 


The last problem could also have been solved directly by counting 
n(L UC) = 87. The identities used here will prove much more useful 
when we encounter problems which cannot be solved by counting. 


3.1.3 More Counting Problems 


It is a simple task to find the probability that a single card drawn from a 
deck is a king. Some probability calculations are a bit more complex. In 
this section we will give examples of individual probability calculations 
which are more interesting. 


Example 3.6 In Example 2.40 we looked at a company with 20 
male employees and 30 female employees. The company is going to 
choose 5 employees at random for drug testing. What is the probability 
that the five chosen employees consist of (а) 3 males and 2 females; 
(b) all males; (c) all females? 

Solution The total number of ways to choose 5 employees from 
the entire company is C(50,5). This will be the denominator of the 
solution in each part of this problem. 


(22) = 2,118,760 


(a) Тһе total number of ways to choose a group of 3 males and 
2 females is 


(3) (33) — 1140 - 435 — 495,900. 


The probability of choosing a group of 3 males and 2 
females is therefore 


(3) (9) 
3 }\ 2 495,900 


(2) = 4.118760 дш 234. 
5 
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(b) An all-male group consists of 5 males and 0 females. 
Reasoning as in part (a), we find that the probability of 
choosing an all male group is 


(308) (3) 
5 }\ 9 5 15,504 pos 


50) үзү > 2,118,760 ~ 
5 5 
(c) Similarly, the probability of choosing an all-female group is 


Y) 
_ 142,506 _ 
(9) P) = 5118760 ~ 067. d 


The above analysis is useful in many different applications. The 
next example deals with testing defective parts; the mathematics is 
identical. 


Example 3.7 A manufacturer has received a shipment of 50 parts. 
Unfortunately, 20 of the parts are defective. The manufacturer is going 
to test a sample of 5 parts chosen at random from the shipment. What is 
the probability that the sample contains (a) 3 defective parts and 2 good 
parts; (b) all defective parts; (c) no defective parts? 


Solution 


495,900 
2,118,760 


(22) (39) (2) 
_ 015,504 .. 
©) (39) = (9) = 3118760 ~ 007 
5 


до .234 


39) 
5 142,506 
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The range of different possible counting problems is very wide. 
The next example is not at all similar to the last two. 


Example 3.8 Four people are subjected to an ESP experiment. 
Each one is asked to guess a number between 1 and 10. What is the 
probability that (a) no two of the four people guess the same number; 
(b) at least two of the four guess the same number? 
Solution 
(a) Each of the four people has the task of choosing from the 
numbers 1 to 10. The total number of ways this can be done 
is the number of ways to perform 4 tasks with 10 
possibilities on each task, which is 104. The number of ways 
for the four people to choose 4 distinct numbers is 
10-9.8.7 = P(10,4) = 5040. (The first person has all 10 
numbers to choose, leaving 9 for the second, 8 for the third, 
and 7 for the fourth.) Then the probability that none of the 
four guess the same number is 
P(10,4 5,040 _ 
TONO = dodo = 504. 
(b) At least two people guess the same number if it is not true 
that none of the 4 guess the same number. 


P(at least two people guess the same) 


1 — P(no two people guess the same) 


. P(I106,4) _ 


= 1 = .496 oO 
104 


In the previous example there were four people picking numbers 
from 1 to 10. A very similar problem occurs when you ask if any two of 
the four people have the same birthday. In this case, the birthday can be 
thought of as a number between 1 and 365, and we are asking whether 
any two of the people have the same number between 1 and 365. For a 


randomly chosen person, any day of the year has a probability of ET of 


being the birthday. The probability that at least two of the four have the 
same birthday is 


J PQ65,4) _ 


1 
3654 


.016. 
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A surprising result appears when there are 40 people in a room. The 
probability that at least two have the same birthday is 


_ P65, 40) |. 


1— Seep 891. 


This result provides an interesting classroom demonstration for a teacher 
with 40 students and a little bit of nerve. (Remember that the probability 
of not finding 2 people with the same birthday is about .11.) The 
birthday problem is pursued further in the exercises. 

Many more probability problems can be solved using counting. 
Most of the counting examples in this chapter can easily be used to solve 
related probability problems. A practical illustration of this is Example 
2.39, which showed that the Arizona lottery has 5,245,786 possible 
combinations of 6 numbers between 1 and 42. This means that if you 
hold a lottery ticket and are waiting for the winning numbers to be 
drawn, the probability that your numbers will be drawn is 1/5,245,786. 


3.2 Probability When Outcomes Are Not Equally Likely 


The outcomes in an experiment are not always equally likely. We have 
already discussed the example of a biased coin which comes up heads 
65% of the time and tails 35% of the time. Dice can be loaded so that the 
faces do not have equally likely probabilities. Outcomes in real data 
studies are rarely equally likely — e.g., the probability of a family 
having 5 children is much lower that the probability of having 2 
children. In this section we will take a detailed look at a situation in 
which probabilities are not equally likely, and develop some of the key 
concepts which are used to analyze the probability in the general case. 


Example 3.9 A large HMO is planning for future expenses. One 
component of their planning is a study of the percentage of births which 
involve more than one child — twins, triplets or more. The study leads 
to the following table:! 


! These numbers are adapted from the 2006 edition of Statistical Abstract of the United 
States, Table 75. 
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Number of children 
Percent of all births | 96.70% | 3.11% | 0.19% 


How will the company assign probabilities to multiple births for future 
planning? 

Solution The table shows that the individual outcomes are not 
equally likely — a result which would not surprise anyone. The table 
also gives us numbers to use as the probabilities of individual outcomes. 


P(1) = .9670 P(Q)-2.011  P(3) = .0019 


Once probabilities are defined for the individual outcomes, it is a simple 
matter to define the probability of any event. For example, consider the 
event E that a birth has more than one child. In set notation, E = {2,3}. 
We can define 


P(E) = P(2 U3) = P(2) + P(3) 2.0311 + .0019 = .0330. 


What we have done here is to apply the addition rule to the mutually 
exclusive outcomes 2 and 3. We can define the probability for any event 
in the sample space S = {1,2,3} in the same way —— just add up the 
probabilities of the individual outcomes in the event. It is important to 
note that 


P(S) = P(1) + PQ) + Р(3) = .9670 + .0311 + .0019 = 1. 


The sum of the probabilities of all the individual outcomes is 1. О 
3.2.1 Assigning Probabilities to a Finite Sample Space 


Example 3.9 illustrated a natural method for assigning probabilities to 
events in any finite sample space with n individual outcomes denoted by 
O1, O2,..., On - 


(1) Assign a probability P(O;) > 0 to each individual outcome 
O;. The sum of all the individual outcome probabilities must 
be 1. 

(2) Define the probability of any event E to be the sum of the 
probabilities of the individual outcomes in the event. (This 
is an application of the addition rule for mutually exclusive 
outcomes.) Then we have 
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PE) =>" POD. 


О;ЄЕ 


Example 3.10 An automobile insurance company does a study to 
find the probability for the number of claims that a policyholder will file 
in a year. Their study gives the following probabilities for the individual 
outcomes 0, 1, 2, 3. 


The individual probabilities here аге all non-negative and add to 1. We 
can now find the probability of any event by adding probabilities of 
individual outcomes. О 


3.2.2 The General Definition of Probability 


Not all sample spaces are finite or as easy to handle as those above. To 
handle more difficult situations, mathematicians have developed an 
axiomatic approach that gives the general properties that an assignment 
of probabilities to events must have. If you define a way to assign a 
probability P(E) to any event E, the following axioms should be 
satisfied: 


(1) Р(Е) > 0 for any event E 

(2) P(S)=1 

(3) Suppose Ej, Р, ..., En, ... is a (possibly infinite) sequence 
of events in which each pair of events is mutually exclusive. 


Then 
(Us) = Y PG. 
i=! i=l 


These axioms hold in Examples 3.9 and 3.10. Events have non-negative 
probabilities, individual probabilities add to one, and the addition rule 
works for mutually exclusive events. 

In this text we will not take a strongly axiomatic approach. In 
situations where individual outcomes are not equally likely, we will 
define event probabilities in an intuitively natural way (as we did in the 
preceding examples) and then proceed directly to applied problems. The 
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reader can assume that the above axioms hold, and in most cases it will 
be obvious that they do. 

One advantage of the axiomatic approach is that the probability 
rules derived for equally likely outcomes can be shown to hold for any 
probability assignment that satisfies the axioms. In any probability 
problem we can use the following rules: 


P(~E) = 1— P(E) 
P(AU B) = P(A) + P(B) - P(An B) 
P(AU B) = P(A) + P(B), if A and В are mutually exclusive 


The proof of the last rule from the axioms is simple — it is a special 
case of Axiom (3). Proofs of the first two properties from the axioms are 
outlined in the exercises. However, the emphasis here is not on proofs 
from the axioms. The important thing for the reader to know is that when 
probabilities have been properly defined, the above rules can be used. 


3.3 Conditional Probability 


In some probability problems a condition is given which restricts your 
attention to a subset of the sample space. When looking at the employees 
of a company, you might want to answer questions about males only or 
females only. When looking at people buying insurance, you might want 
to answer questions about smokers only or non-smokers only. The next 
section gives an example of how to find these conditional probabilities 
using counting. 


3.3.1 Conditional Probability by Counting 


Example 3.11 A health insurance pool includes 200 individuals. 
The insurer is interested in the number of smokers in the pool among 
both males and females. The following table (called a contingency 
table) shows the desired numbers. 
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Maes (0) [ Females (F) [ Toit 
ИСТИИ a Го рз 
[Non-smokers(-5)| —72 [78 | 150 | 


Suppose one individual is to be chosen at random. Counting can be used 
to find the probability that the individual is a male, a female, a smoker, 
or both. 


РОМ) = 100 = 5 pgy-19.5 PS) = 30, = 25 


Р(М п\ 5) = 28 = 14 Р(Е NS) = 22 = .11 


Suppose you were told that the selected individual was a male, and asked 
for the probability that the individual was a smoker, given that the 
individual was a male. (The notation for this probability is P(S|M).) 
Since there are only 100 males and 28 of them are smokers, the desired 
probability can be found by dividing the number of male smokers by the 
total number of males. 


n($0M) _ 
a = 


nM) = 28 


P(S|M) = 


This problem can also be solved using probabilities. If we divide the 
numerator and denominator of the last fractional expression by 200 (the 
total number of individuals), we see that 


The probability that the selected individual was a smoker, given that the 
individual was a female, can be found in the same two ways. 


P(S|F) = UM. H = 22 
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Note that the above conditional probabilities can be stated in words in 
another very natural way. In this group, 28% of the males smoke and 
22% of the females smoke. m 


3.3.2 Defining Conditional Probability 


Example 3.11 showed two natural ways of finding a conditional probabi- 
lity. The first was based on counting. 


Conditional Probability by Counting for Equally Likely 
Outcomes 


n(An B) 


Р(А|В) = "py 


(3.3) 


When outcomes are not equally likely, this rule does not apply. Then we 
need a definition of conditional probability based on the probabilities 
that we can find. This definition is based on the second approach to 
conditional probability used in the example. 


Definition 3.2 For any two events А and B, the conditional 
probability of A given B is defined as follows: 


Definition of Conditional Probability 


P(A|B) = HS 


Example 3.12 In Example 3.9, probabilities were found for the 
number of children in a single birth. 


P(1)2.9781 | P(2-.0234 Р(3) = .0008 


Suppose M is the event of a multiple birth, so that, M = {2,3}. Find the 
probability of the birth of twins, given that there is a multiple birth. 
Solution We need to find P(2|M). We first note that 


P(M) = .0231 + .0008 = .0239 
and 
P(M n2) = P(2) = .0231. 
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Then by Definition 3.2, 


PLOM 
P(2|M) = P - ) = 0831 sor. 
The result tells us that approximately 96.7% of the multiple births are 
twins. Г] 


Example 3.13 In Example 3.10, probabilities were given for the 
possible numbers of insurance claims filed by individual policyholders. 


Cuesta TO TTT Ts 
oi 


Find the probability that a policyholder files exactly 2 claims, given that 
the policyholder has filed at least one claim. 

Solution Let C be the event that at least one claim is filed. 
Then C = {1,2,3} and P(C) = .22 + .05 + .01 = .28. We also need 
the value P(2 N С) = P(2) = .05. Then 

PRAC) 


РОЈС) = y^ = 05 a; 179. 


This tells us that approximately 17.9% of the policyholders who file 
claims will file exactly 2 claims. D 


It is often simpler to find conditional probabilities by direct 
counting without using Equation (3.4). 


Example 3.14 A card is drawn at random from a standard deck. 
The card is not replaced. Then a second card is drawn at random from 
the remaining cards. Find the probability that the second card is a king 
(K2), given that the first card drawn was a king (K 1). 

Solution If a king is drawn first and not replaced, then the deck 
will contain 51 cards and only 3 kings for the second draw. 


P(K2|K1) = E = .0588 


In this case the probability formula given by Equation (3.4) would 
require much more work to get this simple answer. L1 

The definition of conditional probability, given by Equation (3.4), 
can be rewritten as a multiplication rule for probabilities. 
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Multiplication Rule for Probability 


P(An B) = Р(А|В): P(B) 


Example 3.15 Two cards are drawn from a standard deck without 
replacement, as in Example 3.14. Find the probability that both are 
kings. 

Solution 

Р( КІП K2) = P(K1)- P(K2|K1) = 5 E «0005 О 


3.3.3 Using Trees in Probability Problems 
Experiments such as drawing 2 cards without replacement and checking 


whether a king is drawn can be summarized completely using trees. The 
tree for Examples 3.14 and 3.15 is shown below. 


First Draw Second Draw Outcome Probability 
K2 (K1, K2) (4/52)(3/51) 


-K2 (K1,~K2)  (4/52)(48/51) 


K2 (-К1, К2)  (48/52Y4/51) 


-K2 (~K1,~K2) (48/52X47/51) 


The first two branches on the left represent the possible first draws, and 
the next branches to the right represent the possible second draws. We 
write the probability of each first draw on its branch and the conditional 
probability of each second draw on its branch. At the end of each final 
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branch we write the resulting 2-card outcome and the product of the two- 
branch probabilities. The multiplication rule tells us that the resulting 
product is the probability of the final 2-card outcome. For example, the 
product of the two fractions on the topmost branch is P(K 1M K2), as 
calculated in the previous example. 

The tree provides a rapid and efficient way to display all outcome 
pairs and their probabilities. This simplifies some harder problems, as 
the next example shows. 


Example 3.16 Two cards are drawn at random from a standard 
deck without replacement. Find the probability that exactly one of the 
two cards is a king. 

Solution The only pairs with exactly one king are (J€ 1, -.€ 2) and 
(~K 1, K2). The desired probability is 


P((K1,~K2)] + P-K1, К2)] = B48 + BA = 145. n 


An intuitive description of our method for finding the probability 
of exactly one king would be to say that we have added up the final 
probabilities of all tree branches which contain exactly one king. This 
technique will be explored further in Section 3.5 on Bayes’ Theorem. 


3.3.4 Conditional Probabilities in Life Tables 


Life tables give a probability of death for any given year of life. For 
example, Bowers, et al. [2] has a life table for the total population of the 
United States, 1979-1981. That table gives, for each integral age х, the 
estimated probability that an individual at integral age x will die in the 
next year. This probability is denoted by qz. 


qz = P(an individual aged = will die before age x + 1) 


For example, 


ll 


gos = .00132 = P(a 25-year-old will die before age 26) 


and 
957 = .01059 = P(a 57-year-old will die before age 58). 


Life tables are used in the pricing of insurance, the calculation of life 
expectancies, and a wide variety of other actuarial applications. They are 
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mentioned here because the probabilities in them are really conditional. 
For example, 925 is the probability that a person dies before age 26, 
given that the person has survived to age 25. 


3.4 Independence 


3.41 An Example of Independent Events; The Definition of 
Independence 


Example 3.17 A company specializes in coaching people to pass 
a major professional examination. The company had 200 students last 
year. Their pass rates, broken down by sex, are given in the following 
contingency table. 


[ates [ Females [ Total 
а [ 54 | 66 [12 
[Fi | 36 | 44 80 
аг —| 99 | по ] 200 


This table can be used to calculate various probabilities for an individual 
selected at random from the 200 students. 


P(Pass) = 420 = .60 


P(Pass|Male) = 3} =.60 — P(Pass|Female) = f = .60 
These probabilities show that the overall pass rate was 60%, and that the 
pass rate for males and the pass rate for females were also 60%. When 
males and females have the same probability of passing, we say that 


passing is independent of gender. 


The reasoning here leads to the following definition. 


Definition 3.3 Two events A and B are independent if 


P(A|B) = P(A). 
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In the above example, the events Pass and Male are independent 
because P(Pass|Male) = P(Pass). When events are not independent 
they are called dependent. 

In Example 3.11 we looked at an insurance pool in which there 
were males and females and smokers and non-smokers. For that pool, 
P(S) = .25 but P(S|M) = .28. The events 5 and M are dependent. 
(This was intuitively obvious in the original example. 2896 of the males 
and only 22% of the females smoked. The probability of being a smoker 
depended on the sex of the individual.) 

In many cases it appears obvious that two events are independent 
or dependent. For example, if a fair coin 1s tossed twice, most people 
agree that the second toss 1s independent of the first. This can be proven. 


Example 3.18 The full sample space for two tosses of a fair coin 
{HH , HT,TH,TT}. 


The four outcomes are equally likely. Let H1 be the event that the first 
toss is a head, and H2 the event that the second toss is a head. Show that 
the events H1 and H2 are independent. 

Solution We have H2 = {HH, TH} and P(H2) = .50. Given 
that the first toss is a head, the sample space is reduced to the two 
outcomes {H H, HT}. Only one of these outcomes, Н Н, has a head as 
the second toss. Thus P(H2|H 1) = .50. Then P(H2|H1) = P(H2), and 
thus H1 and H2 are independent. L1 


Coin-tossing problems are best approached by assuming that two 
successive tosses of a fair coin are independent. The counting argument 
above shows that is true. 

There is another common problem in which independence and 
dependence are intuitively clear. If two cards are drawn from a standard 
deck without replacement of the first card, the probability for the second 
draw clearly depends on the outcome of the first. If a card 1s drawn and 
then replaced for the second random draw, the probability for the second 
draw is clearly independent of the first draw. 
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3.4.3 The Multiplication Rule for Independent Events 
The general multiplication rule for any two events, given by Equation 
(3.5), is 

P(An B) = P(AIB)- P(B). 


If A and B are independent, then P(A|B) = P(A) and the multiplication 
rule is simplified: 


Multiplication Rule for Independent Events 


P(An В) = P(A)- P(B) 


In some texts this identity is taken as the definition of independence and 
our definition is then derived. This multiplication rule makes some 
problems very easy if independence is immediately recognized. 


Example 3.19 A fair coin is tossed twice. What is the probability 
of tossing two heads? 
Solution The two tosses are independent. The multiplication rule 
yields 
РНН) =5-4= 1. и 


The multiplication rule extends to more than two independent 
events. If a fair coin is tossed three times, the three tosses are indepen- 
dent and 


In fact, the definition of independence for n > 2 events states that the 
multiplication rule holds for any subset of the n events. 
Definition 3.4 The events Ai, A2,..., An are independent if 
P(A; N Ai, N+ N A) = P(A; ) x P(A;,) Xr х PU), 
forl <a) Mtg See Cap <n. 
The situation is more complicated than it appears. Exercise 3-30 


will show that it is possible to have three events A, B and C such that 
each pair of events is independent but the three events together are not 
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independent. Independence may be tricky to check for in some special 
problems. However, in this text there will be many problems where inde- 
pendence is intuitively obvious or simply given as an assumption of the 
problem. In those cases, the general multiplication rule should be applied 
immediately. 


Example 3.20 A fair coin is tossed 30 times. What is the probabi- 
lity of tossing 30 heads in a row? 
Solution 


ү? 1 
(4) = 1,073,741,824 
Don’t bet on it! L1 


Example 3.21 А student is taking a very difficult professional 
examination. Unlimited tries are allowed, and many people do not pass 
without first failing a number of times. The probability that this student 
will pass on any particular attempt is .60. Assume that successive 
attempts at the exam are independent. (If the exam is unreasonably 
tricky and changes every time, this may not be a bad assumption.) What 
is the probability that the student will not pass until his third attempt? 

Solution 

P(Fail and Fail and Pass) = (.40)(.40)(.60) = .096 o 


Example 3.22 An insurance company has written two life 
insurance policies for a husband and wife. Policy 1 pays $10,000 to their 
children if both husband and wife die during this year. Policy 2 pays 
$100,000 to the surviving spouse if either husband or wife dies during 
this year. The probability that the husband will die this year (Hp) is 
.011. The probability that the wife will die this year (Wp) is .008. Find 
the probability that each policy will pay a benefit this year. You are to 
assume that the deaths of husband and wife are independent. 

Solution 

Policy 1: The probability of payment is 

P(Hp and Wp) = (.011)(.008) = .000088. 


Policy 2: The probability of payment is 
P(HpUWp) = P(Hp) + P(Wp) – P(Hp N Wp) 


= .011 + .008 — .000088 = .018912. O 
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3.5 Bayes’ Theorem 
3.5.1 Testing a Test: An Example 


In Example 2.27, we showed how to list the possible outcomes of a 
disease test using a tree. In the discussion, we mentioned that disease 
tests can have their problems. A test can indicate that you have the 
disease when you don’t (a false positive) or indicate that you are free of 
the disease when you really have it (a false negative). Most of us are 
subjected to other tests that have similar problems — placement tests, 
college and graduate school admission tests, and job screening tests are a 
few examples. Bayes’ Theorem and the related probability formulas 
presented in this section are quite useful in analyzing how well such 
tests are working, and we will begin discussion of Bayes’ Theorem with 
a continuation of the disease-testing example. (This material has a wide 
variety of other applications.) 


Example 3.23 The outcomes of interest in a disease test, from 
Example 2.27, are the following: 
D: the person tested has the disease 
—D: the person tested does not have the disease 
Y: the test is positive 
N: the test is negative 


In this example, we will consider a hypothetical disease test which most 
people would think of as “95% accurate", defined as follows: 


(a) P(Y|D) = .95; in words, if you have the disease there is a 
.95 probability that the test will be positive. 

(D P(N|~D) = .95; if you don't have the disease the probabili- 
ty is .95 that the test will be negative. 


Only 1% of all people actually have the disease, so P(D) = .01. The 
tree for this test (with branch probabilities) is given on the following 


page. 
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Outcome Probability 
(D, Y) .0095 


М (D,N)  .0005 


(~D, Y)  .0495 


(~D, № .9405 


The tree illustrates that the test is misleading in some cases. 5% of 
individuals with the disease will test negative, and 5% of the individuals 
who do not have the disease will test positive. There are two important 
questions to ask about this test. 


(a) What percentage of the population will test positive? This 
percentage is given by P(Y). 

(b) Suppose you know that someone has tested positive for the 
disease. What is the probability that the person does not 
actually have the disease? (This probability is P(~D|Y).) 


Solution 
(a) P(Y) is just the sum of the probabilities of all branches 
ending in Ү. 


P(Y) = P((D,Y)] + P(-D, Y)] = .0095 + .0495 = .059 


(b) Note that the event ~DMY corresponds to the branch 
(—D, Y). 
PODANY) _ PCDY) 0495 


PCDIY)- —pyy- =- py) = 20590 © 839 


The practical information here is interesting. The “95% accurate" test 
will classify 5.9% of the population as positives — a classification 
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which can be alarming and stressful. 83.9% of the individuals who tested 
positive will not actually have the disease. t 


In Example 3.23 we used Bayes' Theorem and the law of total 
probability without mentioning them by name. In the next section we 
will state these useful rules. 


3.5.2 The Law of Total Probability; Bayes’ Theorem 


In Example 3.23 we found P(Y) by breaking the event Y into two 
separate branch outcomes, so 


р) (~D, Y), 
which enabled us to write 
P(Y) = P((D,Y)] + PI(~D,Y)]. 
Using set notation, we could rewrite the last two identities as 
Y=(DNY)UCDNY) 


and 
Р(Ү) = P(DNY)+ PEDNY). 


Note that D U ~D = S. The events D and ~D partition the sample 
space into two mutually exclusive pieces. Then the events (DM Y) and 
(^D MY) break the event Y into two mutually exclusive pieces. This is 
illustrated in the following figure. 


Sample Space 


The events D and —D are said to partition the sample space. This is a 
special case of a more general definition. 
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Definition 3.5 The events Ai, A2, ..., An partition the sample 
space S if Aj U A3 U---U A, = S and А; ПА; = 0 fori F j. 


Sample Space 


ET А» Án 


The law of total probability says that a partition of the sample 
space will lead to a partition of any event E into mutually exclusive 
pieces. 


E = (А,\П E)U(A; П E)U--- U(A,Q E) 
Then we can write P(E) as the sum of the probabilities of those pieces. 


Law of Total Probability 


Let E be an event. If Aj, A2, ..., An partition the sample space, 
then 


P(E) = P(A, NE) + P(A, NE) +++ + P(A, ПЕ). (3.7) 


This is the law we used intuitively when we wrote 


Y-(DnY)u(-DnY)- р, Ү)ҳ-р, Y) 
and 
Р(Ү) = P(DNY)+ P(-DnY) 
In that case n = 2, Ay = D, and A; = ~D. 

The law of total probability can be rewritten in a useful way. In the 
disease testing example, the probabilities P[(D,Y)] and P[(~D,Y)] 
appeared to be read directly from the tree, but they were actually 
obtained by multiplying along branches. 


P(DnY) = P(D)- P(Y|D) P(-DnY)- P(-D)- PY FD) 
Thus when we found P(Y), we were really writing 


PY) = P(DnY)4 P(-DnY)- P(D)- P(Y|D) + P(-D)- P(Y|-D). 
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When we calculated P(~D|Y), our reasoning could be summarized as 


P(-DnY) P(-D).- P(Y |-D) 


P(~DIY) = pry) ~ P(D): P(Y|D) + P-D)- P(Y|-D) 


The last expression on the right is referred to as Bayes’ Theorem. It 
looks complicated, but can be stated simply in terms of trees. 


Probability for (—D '1Y )branch 


P(-DIY) = Sum of probabilities for all branches ending in Y 


The general statement of Bayes’ Theorem is simply an extension 
of the above reasoning for a partition of the sample space into n events. 


Bayes’ Theorem 


Let E be an event. If Aj, A2, ..., A, partition the sample space, 
then 


P(A|E) = 


P(En Aj) 
P(E) 


n Р(А;): P(E|Ai) 
^ P(A)D- PF(E|A) + P(A2)- P(E|A2) +--+ + P(A)- P(E An) 


(3.8) 


We illustrate the use of Bayes' Theorem for a partition of the sample 
space into 3 events in the next example. 


Example 3.24 An insurer has three types of auto insurance poli- 
cyholders. 50% of the policyholders аге low risk (Г). The probability 
that a low-risk policyholder will file a claim in a given year is .10. 
Another 3096 of the policyholders are moderate risk (M). The 
probability that a moderate-risk policyholder will file a claim in a given 
year is .20. Finally, 2096 of the policyholders are high risk (H). The 
probability that a high-risk policyholder will file a claim in a given year 
is .50. A policyholder files a claim this year. Find the probability that he 
is a high-risk policyholder. 
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Solution The given probabilities lead to the following tree. 


_ РЧ ОС) _ 10 
PICO) = prey = 353-061 10 


~ 476 


This shows that approximately 47.6% of the claims are filed by high-risk 
drivers. Г] 


Note that in a typical problem it is simpler to draw the tree and use 
branch probabilities than it is to memorize the formula and try to 
substitute numbers into it. For many people the tree provides the intui- 
tion to understand and memorize the formula. 
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3.6 


3.1 


3-1. 


3-2. 


3-3 


3-8. 


Exercises 


Probability by Counting for Equally Likely Outcomes 


You toss a fair coin 3 times. What is the probability that you get 
2 heads and 1 tail? (Note: All possible outcomes for this exper- 
iment were given in a tree in Section 2.5.3.) 


If a fair coin is tossed 3 times what is the probability of getting 
at least 1 head? 


An urn contains 3 red balls, 7 green balls and 6 blue balls. Ifa 
ball is selected at random from the urn, what is the probability 
that it is (a) red; (b) not green? 


A consulting company has 68 employees. Of these 21 have 
degrees in mathematics, 33 have degrees in economics and 7 
have degrees in both. What is the probability that an employee 
chosen at random has a degree in either mathematics or econ- 
omics? 


If a pair of dice is rolled, what is the probability that the sum of 
the two dice is (a) 7; (b) 11; (c) less than 5? 


An insurance agent has 78 clients. Of these 45 have life insur- 
ance, 32 have auto insurance, and 16 have both types. What is 
the probability that a client chosen at random has neither life nor 
auto insurance? 


An urn contains 4 red balls and 6 green balls. Three balls are 
selected at random. What is the probability (a) all 3 are red; (b) 
1 is red and 2 are green; (c) all 3 are the same color? 


A computer company has a shipment of 40 computer compo- 
nents of which 5 are defective. If 4 components are chosen at 
random to be tested, what is the probability that (a) all are good; 
(b) 2 are good and 2 are defective? 
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3-9. 


3-10. 


3-13. 
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Ten people, 5 men and 5 women, are to be seated in a row of 
ten chairs. What is the probability that the men and women end 
up in alternate chairs? 


8 people were all born in January. What is the probability that at 
least 2 of them have the same birthday? 


What is the probability that at least 2 of a group of 4 people 
were born on the same day of the week? 


4 balls are picked at random from an urn containing 5 red balls 
and 6 blue balls. What is the probability that you get balls of 
both colors? 


A S-card poker hand is dealt from a standard deck of cards. 
What is the probability that you get a full house (3 of one kind 
plus a different pair, such as KKK55) ? 


If a poker hand is dealt, what is the probability that you get 2 
pairs (e.g., QQ993)? 


The odds for an event E are defined as the ratio P(E) to P(—E). 
Odds are generally written as the ratio of two integers, such as 
5:4, which is read “5 to 4". The odds against Ё are given by the 
reverse ratio (1.е., 4:5). If a pair of dice are rolled, what are (а) 
the odds for a 7; (b) the odds against an 11? 


If the odds for Е are known, say r:s, then P(E) = r/(r + s). If 
the odds against F are a:b, what is the Р(Е)? 


Probability When Outcomes Are Not Equally Likely 
Prove P(-E) = 1 — F(E). 


Prove P(A U В) = P(A) + P(B) — P(A п B) using the axioms 
in Section 3.2.2. Hint: First show that 


(AU B) = (An -B)U(An B)U(-An B). 
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3-19. 


3-20. 


3-21. 


3-22. 


3.3 


3-23. 


3-24. 


3-25. 


3-26. 


A four-year college has the following enrollment by class: 
27.8% freshman, 26.3% sophomore, 24.4% junior and 21.5% 
senior. What is the probability that a student chosen at random is 
a junior or senior. 


An auto insurance company finds that in the past 10 years 22% 
of its policyholders have filed liability claims, 37% have filed 
comprehensive claims, and 13% have filed both types of claims. 
What is the probabihty that a policyholder chosen at random has 
not filed a claim of either kind? 


A teacher's grade distribution for the year is as follows: A, 
13.1%; B, 27.8%; C, 31.2%; D, 8.9%; E, 9.4%; and W, 9.6%. 
What is the probability that a student of this teacher got (a) a 
grade C or better; (b) a grade of D or E? 


In a survey of college students it was discovered that 37% had 
received flu shots, 58% had a skin test for tuberculosis, and 21% 
had received neither. What is the probability that a student 
received both? 


Conditional Probability 


In Exercise 3-21 what is the probability that a randomly selected 
student got an A, given that she got a grade of C or better? 


In the first quarter of a year, a company's records showed that 
63.5% of its employees missed no work, 23.7% missed one day 
of work, 8.1% missed two days, and 4.7% missed three days. 
What is the probability that an employee who missed work 
missed only one day? 


An insurance company classifies its claims as low if they are 
under $10,000, and high otherwise. During the year 79.2% of its 
policyholders filed no claims, 16.995 filed low claims, and 3.996 
filed high claims. If a policyholder filed a claim, what is the 
probability that it was a low claim? 


Two cards are drawn from a standard deck without replacement. 
What is the probability that (a) both are hearts; (b) neither is a 
heart; (c) exactly one is a heart? 


3-28. 


3-29. 


3.4 


3-30. 


3-31. 


Chapter 3 


For the experiment of tossing a single fair coin 3 times, what is 
the probability of getting exactly 2 heads, given that you get at 
least one head? 


For the experiment in Exercise 3-27 what is the probability of 
getting exactly 2 heads, given that the first toss is a head? 


Three cards are drawn from a standard deck. What is the 
probability that all three are hearts, given that at least two of 
them are hearts? 


Independence 


Let X be the experiment of drawing a single card from a deck. 
Let A be the event the card is a spade or a heart, B be the event 
it is a spade or a diamond, and C be the event it is a spade or a 
club. Show that each of the pairs (A, B), (A, C) and (B,C) is 
independent. Show that P(A N Bn C) = P(A)- P(B)- P(C). 


Two cards are drawn from a standard deck with replacement. 
Let Al be the event the first card is an ace and A2 be the event 
the second card is an ace. Show that Al and A2 are independent. 


Let S be the sample space for rolling a single die. Let 
A = {1,2,3,4}, В = {2,3,4}, and C = {3,4,5}. Which of the 
pairs (A, B), (A, C) and (B, C) is independent? 


A company needs some of its employees for a task that requires 
that they not be color blind. In testing them it finds that 7 of the 
130 men are color blind and 2 of the 170 women are color blind. 
Are the events male and color blind independent or dependent? 


A student is taking a history course and an English course. He 
decides that the probability of passing the history course is .75 
and the probability of passing the English course is .84. If these 
events are independent, what is the probability that (a) he passes 
both courses; (b) he passes exactly one of them? 
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3-35. 


3-36. 


3.5 


3-38. 


3-39. 


3-40. 


A company has three identical machines operating independent- 
ly of each other. The probability of any one machine breaking 
down during the next year is .05. What is the probability that 
during the next year there will be no breakdowns? 


A machine has two parts that could fail and have to be replaced. 
The probabilities of failure of parts A and B are .17 and .12, 
respectively. If failures of these parts are independent of each 
other, what is the probability that at least one of them will fail? 


For the experiment of tossing a single fair coin 3 times, let E be 
the event the first toss is a head and Е be the event 2 heads and 
1 tail are tossed. Are E and F independent? 


Bayes? Theorem 


A manufacturing company has a fabrication plant and an 
assembly line. The fabrication plant has 6096 of the employees 
and the assembly line 40%. During the past year 35% of the 
workers in the fabrication plant sustained injuries and 20% of 
the assembly line workers had injuries. 
(a) What percentage of all workers had injuries in this period? 
(b) Ifan employee had an injury, what is the probability that 
he worked on the assembly line? 


Two jars contain coins. Jar I contains 5 pennies, 4 nickels and 6 
dimes. Jar II contains 6 pennies, 4 nickels and 2 dimes. A jar is 
selected at random and a coin is selected from that jar. If the 
coin is a nickel, what is the probability that it came from Jar II? 


An insurance company divides its policyholders into low-risk 

and high-risk classes. For the year, of those in the low-risk class, 

80% had no claims, 15% had one claim, and 5% had 2 claims. 

Of those in the high-risk class, 50% had no claims, 30% had one 

claim, and 20% had two claims. Of the policyholders, 60% were 

in the low-risk class and 40% in the high-risk class. 

(a) Ifa policyholder had no claims in the year, what is the 
probability that he is in the low-risk class? 

(b) Ifa policyholder had two claims in the year, what is the 
probability that he is in the high-risk class? 
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3-41. 


3-42. 


3-43. 


3-44. 


3-45. 


3.7 


3-46. 
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A manufacturer has three machines producing light bulbs. 
Machine A produces 40% of the light bulbs with 1% of them 
defective. Machine B produces 3596 of them with 296 being 
defective. Machine C produces 25% with 4% being defective. If 
a light bulb is tested and found to be defective, what is the 
probability that it was produced by machine A? 


A skin test for a disease is less expensive but less accurate than 

an X-ray. In a country 2096 of the adult population has this 

disease. For a person with the disease, the skin test is positive 

95% of the time. If a person does not have the disease, it will be 

positive 3096 of the time. 

(a) What is the probability that a person who tests positive 
does not have the disease? 

(b) What is the probability that a person who tests negative 
has the disease? 


A card is drawn from a deck, not replaced, and a second card is 
drawn. What is the probability that the second card is a heart? 


A company classifies injuries to its workers as minor if the 
worker does not have to take time off and severe if the worker 
has to take time off. The company has two plants, A and B. In 
plant A 60% of the workers had no injuries, 30% had minor 
injuries, and 10% had severe injuries. In plant B 50% had no 
injuries, 35% minor injuries, and 15% severe injuries. 70% of all 
workers work in plant A and 30% in plant B. What 15 the 
probability that a worker with a severe injury worked in plant A? 


In Exercise 3-44, what is the probability that a worker who had 
an injury worked in plant B and had a minor injury? 


Sample Actuarial Examination Problems 


The probability that a visit to a primary care physicians (PCP) 
office results in neither lab work nor referral to a specialist is 
35%. Of those coming to a PCP's office, 30% are referred to 
specialists and 40% require lab work. 


Determine the probability that a visit to a PCP's office results in 
both lab work and referral to a specialist. 
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3-47. 


3-48. 


3-49, 


3-50. 


You are given P(AU B) 20.7 and P(Au B^) > 0.9. 
Determine Р[ А]. 


An insurance company examines its pool of ашо insurance 
customers and gathers the following information: 


(i) АП customers insure at least опе car. 

(ii) 64% of the customers insure more than one car. 

(iii) 20% of the customers insure a sports car. 

(iv) Of those customers who insure more than one car, 15% 
insure a sports car. 


What is the probability that a randomly selected customer 
insures exactly one car, and that car is not a sports car? 


Among a large group of patients recovering from shoulder 
injuries, it is found that 22% visit both a physical therapist and a 
chiropractor, whereas 12% visit neither of these. The probability 
that a patient visits a chiropractor exceeds by 0.14 the probability 
that a patient visits a physical therapist. 


Determine the probability that a randomly chosen member of 
this group visits a physical therapist. 


A survey of a group’s viewing habits over the last year revealed 
the following information: 


(i) 2896 watched gymnastics 

(ii) 29% watched baseball 

(iii) 19% watched soccer 

(iv) 14% watched gymnastics and baseball 
(v) 12% watched baseball and soccer 

(vi) 1096 watched gymnastics and soccer 
(vii) 8% watched all three sports. 


Calculate the percentage of the group that watched none of the 
three sports during the last year. 
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An actuary studying the insurance preferences of automobile 
owners makes the following conclusions: 


(i) Ап automobile owner is twice as likely to purchase colli- 
sion coverage as disability coverage. 

(ii) The event that an automobile owner purchases collision 
coverage is independent of the event that he or she pur- 
chases disability coverage. 

(iii) The probability that an automobile owner purchases both 
collision and disability coverages is 0.15. 


What is the probability that an automobile owner purchases 
neither collision nor disability coverage? 


An insurance company pays hospital claims. The number of 
claims that include emergency room or operating room charges 
is 85% of the total number of claims. The number of claims that 
do not include emergency room charges is 25% of the total 
number of claims. The occurrence of emergency room charges is 
independent of the occurrence of operating room charges on 
hospital claims. 

Calculate the probability that a claim submitted to the insurance 
company includes operating room charges. 


The number of injury claims per month is modeled by a random 


variable N with P[N=n]= where n= 0. 


Determine the probability of at least one claim during a 
particular month, given that there have been at most four claims 
during that month. 


A public health researcher examines the medical records of a 
group of 937 men who died in 1999 and discovers that 210 of the 
men died from causes related to heart disease. 

Moreover, 312 of the 937 men had at least one parent who 
suffered from heart disease, and, of these 312 men, 102 died 
from causes related to heart disease. 


Determine the probability that a man randomly selected from 
this group died of causes related to heart disease, given that 
neither of his parents suffered from heart disease. 
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3-55. 


3-56. 


3-57. 


An urn contains 10 balls: 4 red and 6 blue. A second urn contains 
16 red balls and an unknown number of blue balls. A single ball 
is drawn from each urn. The probability that both balls are the 
same color is 0.44. 


Calculate the number of blue balls in the second urn. 


An actuary is studying the prevalence of three health risk factors, 
denoted by A, B, and C, within a population of women. For each 
of the three factors, the probability is 0.1 that a woman in the 
population has only this risk factor (and no others). For any two of 
the three factors, the probability is 0.12 that she has exactly these 
two risk factors (but not the other). The probability that a woman 
has all three risk factors, given that she has A and B, is 1/3. 


What is the probability that a woman has none of the three risk 
factors, given that she does not have risk factor A? 


An insurer offers a health plan to the employees of a large 
company. As part of this plan, the individual employees may 
choose exactly two of the supplementary coverages A, B, and C, 
or they may choose no supplementary coverage. The proportions 
of the company’s employees that choose coverages A, B, and C 
are 1/4, 1/3, and 5/12, respectively. 


Determine the probability that a randomly chosen employee will 
choose no supplementary coverage. 


An insurance company estimates that 40% of policyholders who 
have only an auto policy will renew next year and 60% of 
policyholders who have only a homeowners policy will renew 
next year. The company estimates that 80% of policyholders 
who have both an auto and a homeowners policy will renew at 
least one of those policies next year. Company records show that 
65% of policyholders have an auto policy, 50% of policyholders 
have a homeowners policy, and 15% of policyholders have both 
an auto and a homeowners policy. 


Using the company’s estimates, calculate the percentage of 
policyholders that will renew at least one policy next year. 
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A blood test indicates the presence of a particular disease 9596 of 
the time when the disease is actually present. The same test 
indicates the presence of the disease 0.5% of the time when the 
disease is not present. One percent of the population actually has 
the disease. 


Calculate the probability that a person has the disease given that 
the test indicates the presence of the disease. 


An insurance company issues life insurance policies in three 
separate categories: standard, preferred, and ultra-preferred. Of the 
company's policyholders, 5096 are standard, 4096 are preferred, 
and 1095 are ultra-preferred. Each standard policyholder has prob- 
ability 0.010 of dying in the next year, each preferred policyholder 
has probability 0.005 of dying in the next year, and each ultra- 
preferred policyholder has probability 0.001 of dying in the next 
year. A policyholder dies in the next year. 


What is the probability that the deceased policyholder was ultra- 
preferred? 


Upon arrival at a hospital's emergency room, patients are catego- 
rized according to their condition as critical, serious, or stable. In 
the past year: 


(i) 10% of the emergency room patients were critical; 
(i) 30% of the emergency room patients were serious; 
(їп) the rest of the emergency room patients were stable; 
(iv) 40% of the critical patients died; 

(vi) 10% of the serious patients died; and 

(уп) 1% of the stable patients died. 


Given that a patient survived, what is the probability that the 
patient was categorized as serious upon arrival? 
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3-62. 


3-63. 


3-64. 


An actuary studied the likelihood that different types of drivers 
would be involved in at least one collision during any one-year 
period. The results of the study are presented below. 


: Percentage Probability of at least 
Type of Driver | of all drivers 
Teen 8% 0.15 
Young Adult 


Midlife 


Given that a driver has been involved in at least one collision in 
the past year, what is the probability that the driver is a young 
adult driver? 


The probability that a randomly chosen male has a circulation 
problem is 0.25. Males who have a circulation problem are twice 
as likely to be smokers as those who do not have a circulation 
problem. 


What is the conditional probability that a male has a circulation 
problem, given that he a smoker? 


A health study tracked a group of persons for five years. At the 
beginning of the study, 20% were classified as heavy smokers, 
30% as light smokers, and 50% as nonsmokers. Results of the 
study showed that light smokers were twice as likely as 
nonsmokers to die during the five-year study, but only half as 
likely as heavy smokers. A randomly selected participant from 
the study died over the five-year period. 


Calculate the probability that the participant was a heavy 
smoker. 


Chapter 4 
Discrete Random Variables 


4.1 Random Variables 
4.1.4 Defining a Random Variable 


Random variables surround us. The (unknown) number of years that you 
are going to live is a random variable, as is the number of auto insurance 
claims you will file in your lifetime and the number of TV sets owned by 
a randomly selected American family. Next year's return on your stock 
portfolio is a random variable, and so is your weight after Thanksgiving. 
The number you roll when you toss dice at a table in Las Vegas 1s also a 
random variable — gambling is always with us in probability. The key 
feature in each of these random variables is that the outcome of interest 
is a number (a count of insurance claims or a weight measurement) and 
it depends on chance. Most of us try not to have accidents or gain 
weight, but somehow those things are forced on us by chance. This leads 
to an intuitive definition of a random variable. 


Definition 4.1 A random variable is a numerical quantity whose 
value depends on chance.! 


! This nice intuitive description of a random variable is taken from Weiss [18], who 
adapted it from the words of the mathematician B.V. Gnedenko. 


84 Chapter 4 


Example 4.1 You are tossing a coin twice and will bet on the 
number of heads. The outcome is a number (0, 1 or 2) which depends on 
chance. The number of heads is a random variable. L1 


Example 4.2 You are tossing a coin twice and will bet on specific 
outcomes such as "first a head then a tail" or HT. The outcome depends 
on chance, but is not a number. This is not a random variable. O 


Example 4.3 A resident of Winsted, Connecticut, is selected at 
random and his height is measured. The height is a number which 
depends on the chance event of random selection. The height is a 
random variable. L1 


Example 4.4 You go to Las Vegas and begin to put quarters in a 
slot machine. Let X be the number of quarters you play before your first 
win of any amount. X is a number and depends on chance. X is a 
random variable. О 


There is an important difference between the height random 
variable in Example 4.3 and the other random variables. Height can be 
measured with such precision that any number between two given 
heights is still a theoretically possible height — if you are given the two 
heights (in inches) 66 and 66.01, any number between 66 and 66.01 is 
still a theoretically possibly height. For this reason, height is said to be 
measured on a continuous scale, and the height random variable is called 
a continuous random variable. In contrast, the outcomes 0, 1 and 2 for 
the numbers in Example 4.1 are distinct, and the values between them 
are not possible. This kind of random variable is called a discrete ran- 
dom variable. In Example 4.4, the possible numbers of attempts before 
the first win at a slot machine are (0, 1, 2, 3, ... ). This sample space is 
discrete and infinite — as any visitor to a casino will attest. 

In this chapter we will study only discrete random variables. 
Continuous random variables require a different approach, which 
requires the use of calculus. They will be studied in Chapter 7. 

Intelligent people often get into ridiculous arguments over whether 
a certain random variable is truly discrete or continuous. For example, 
one of our students became quite excited over the argument that he 
would measure heights to at most 3 decimal places, which meant that 
heights were discrete for him. That is an unproductive argument. The 
real point is that calculus-based continuous mathematics is the most 
efficient way to analyze heights. When we say that heights are continu- 
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ous, we are really just identifying the kind of mathematical model we 
will use. 


4.1.2 Redefining a Random Variable 


Our approach in this text is intuitive and applied. More advanced books 
in probability give more rigorous definitions which are a bit harder to 
understand at first sight. A widely used definition of a random variable 
is the following. 


Definition 4.1a A random variable is a function mapping the 
sample space to the real numbers. 


The idea behind this definition can be visualized by looking at the 
example of the number of heads when two coins are tossed. When we 
look at the results of the tosses, we assign numerical results to the 
physical outcomes we see. 


Original Outcome Number of Heads 
HH ——— 2 
HT  —— 1 
TH =——— 1 
ТТ —————— 0 


This assignment of numerical values is a function from the sample space 
to the real numbers — as the last definition states. We will not use the 
more rigorous definition any further in this text. 


4.1.3 Notation; The Distinction Between X and x 


Random variables are usually denoted by capital letters. If we were to 
look at the random variable for the number of heads in two coin tosses, 
we might use X to represent the entire random variable which can take 
on any of the values 0, 1 or 2. However, specific outcomes are usually 
referred to using small letters. Thus the reader will see statements like 
“Jet z be the number of heads in the first two coin tosses.” This refers to 
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a single realized outcome, not to the entire random variable. This 
confuses students, and the confusion is increased by the convention that 
if z heads are tossed the notation is mixed — we write “X = т.” The 
reader should be aware that we are not arbitrarily mixing capital and 
small letters in our notation. The notation has a purpose, and the 
statement “X = x” is not nonsense. It means that the random variable X 
was realized with a specific value т. 


4.0 The Probability Function of a Discrete Random 
Variable 


4.2.1 Defining the Probability Function 


If we decide to bet on the number of heads which will occur when a fair 
coin is tossed twice, we can better manage our risk if we have a table of 
all possible outcomes and their probabilities. The following table gives 
this useful information. 


Number ofheads (s) | 9 | 1 [2 | 


This table assigns a probability to each individual outcome. Once we 
have such a function, we can use it to find the probability of any event 
by adding the probabilities of the individual outcomes in the event. 


Definition 4.2 Let X be a discrete random variable. A probabili- 
ty function for X is a function p(x) which assigns a probability to each 
value of the random variable, such that 


(a) p(x) 2 0 for all z, and 


(b S p(x) = 1. (The sum of all individual outcome probabili- 
ties 15 1). 


The probability function is also referred to as the probability mass 
function or the discrete density function for X. 

For discrete random variables with a finite number of individual 
outcomes, the probability function can be given by a table. This was 
done for the two coin toss problem at the beginning of this section. 


Discrete Random Variables 87 


Example 4.5 In Example 3.9, a large HMO studied the number of 
children in a given birth. The probability function was as follows: 


Number of children (z) 
9761 | .0231 | .000 


оо 


О 


Example 4.6 In Example 3.10, an automobile insurer studied the 
number of claims filed by a policyholder in a given year. The probability 
function was as follows: 


Number of claims (т) EN 1 2 
72 | 22 | 05 | 01 


L1 


If a discrete random variable has a very large or infinite number of 
possible outcomes, a simple table is not possible, and р(х) must be 
specified in some other way — usually by a formula. 


Example 4.7 On a certain slot machine, the probability of win- 
ning on an individual play is .05. Let X be the number of unsuccessful 
attempts before the first win. If we assume that successive plays are 
independent, the probability of k unsuccessful plays before the first win 
is given by the multiplication rule for independent events. 


p(k) = P(X = k) = .95*(.05), k = 0,1,2,... o 
4.2.2 The Cumulative Distribution Function 


Example 4.8 A clinical researcher is studying a fatal disease. The 
random variable of interest to her is X, the number (x = 1,2, ...) of the 
year following diagnosis in which a patient dies. Her studies lead to the 
probability table given below. 


ECOLO 1 [ 2 | 5 [ 4 
w [5325 


This probability function gives the probability that someone who is diag- 
nosed will die in a specific year following diagnosis. For example, the 
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empirical probability that a person diagnosed today will die sometime 
during the third year from today is .12. However, the table does not 
directly give the probability that a person will die during the first two 
years or the first three years. These probabilities are given by 


P(X € 2) = р(1) + p(2) = .53 + .25 = .78 
and 


P(X < 3) = р(1) + p(2) + p(3) = .53 + .25 + 12 = .90. О 


These useful probabilities are obtained by cumulatively adding 
successive probabilities in the table above. If we do this throughout the 
table, we obtain the cumulative distribution function F(z). 


Definition 4.3 Let X be a random variable. The cumulative 
distribution function F(x) for X is defined by 


P(t) = P(X < x). 


For a discrete random variable, we can find F(x) by adding all values of 
p(y) for y < =. 


Example 4.9 The cumulative distribution function for the proba- 
bility function of Example 4.8 is given by the following table: 


5378] 30] 97 [1.00 | 


This tells us, for example, that for those diagnosed with the disease, the 
probability of death within 3 years of diagnosis is 90%. m 


Note that the last entry in the table for F(x) is 1.00. This will 
always hold for a finite discrete random variable. 


Example 4.10 In Example 4.6 we looked at the distribution of the 
number of claims filed in a year by a policyholder in a large insurance 
company. The cumulative distribution function is given by the following 
table: 
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Number of claims (х) 1 
o ИШ ЕБАЕ ЛЕ ДЕК 


This tells us that 94% of policyholders file one claim or less in a year — 
leaving 6% who file more than one claim. Г] 


In Example 4.10 we gave values of F(z) only for x = 0, 1, 2, 3, 
since those x-values represent the numbers of claims that actually 
occurred. Although it is not possible to have 0.5 claims, we can define 
F(.5). 


Е(5) = P(X € 5) = P(X < 0) = Р(Х = 0) = 72 


Since it is not possible to have an actual claim number in the open 
interval (0, 1), we can see that 


F(a) = Р(Х < х) = Р(Х < 0) = 72,0€ x « 1. 


Continuing this reasoning, we can write a definition F(x) for any real 
number. 


0 r«0 
2 0<х<1 
F(r)—-4.94 1<х<2 
99 2zr«3 
1.00 3З <= 


The graph of F(z) is as follows: 


eo 
€—— ———— —o 
e——————o 
e——— —— —o 
——————————4——— 
0 1 2 3 


The cumulative distribution function for an infinite discrete 
random variable requires a bit more work. For example, the cumulative 
distribution function for the random variable in Example 4.7 requires use 
of the formula for the sum of a geometric series. This is reviewed next. 
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Geometric Series Review 


A geometric series is a series of the form a, ar, ат“, 
ar”. The sum of the series for r Æ 1 is given by 


а+аг+а ttar = а Т” 


1 


The number r is called the ratio or common ratio. If |r| < 1, 
we can sum the infinite geometric series. 


atartar ar" ++. =al р (4.2a) 


Example 4.11 You play a slot machine repeatedly. (How else?) 
The probability of winning on a single play is .05, and successive plays 
are independent. The random variable of interest is X, the number of 
unsuccessful attempts before the first win. Find an expression for F(x). 
Solution In Example 4.7, we showed that 


p(k) = P(X = k) = .95*(.05). 
The cumulative distribution function is given by 


F(x) = p(0) + p(1) +--+ + р(х) 
= .05 + .95(.05) + .95?(.05) + --» + .957(.05) 


rcl 
= 05(1=Э 2°) =1- 95. п 


It is interesting to interpret these values of F(x). For example, the value 
F(4) = Р(Х < 4) x 226 is the probability that at most 4 unsuccessful 
plays will occur before the first win. Then 1 — F(4) = P(X > 4) = .774 
is the probability that at least 5 unsuccessful plays will occur before the 
first win. You have a 77.4% probability of losing at least 5 times before 
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the first win. This means that if you play the slot machine five times in a 
row, the probability of losing all 5 times is approximately .774 and the 
probability of winning at least once in the 5 plays is F(4) = .226. 

This interpretation of the cumulative distribution in the slot 
machine problem holds for any x. F(x) is the probability that you win at 
least once in х + 1 successive plays. This is used in the next example. 


Example 4.12 How many times would you need to play the slot 
machine in Example 4.11 in order to be sure that your probability of 
winning at least once is greater than or equal to .99? 

Solution F(k — 1) = 1 — .95* is the probability that you win at 
least once in k successive plays. We need this probability to be at least 
.99. Set 

1 — .95* = .99. 
Then 
95* = 01 


In(.95*) = k(In(.95)] = In(.01) 


In(.01) ... 
95) ~ 89.78. 


You need k = 89.78 (round up to 90) plays for the probability to be 99% 
that you win at least once. Note that since k was between 89 and 90, the 
probability of winning exactly once in 89 plays is less than .99 and the 
probability of winning exactly once in 90 plays is more than .99. 
Rounding up to 90 guarantees that the probability is at least .99. In 
problems like this one, the value of k is always rounded up. If k had 
been 89.12, we still would have rounded up. Г] 


4.3 Measuring Central Tendency; Expected Value 
4.3.1 Central Tendency; The Mean 


When we try to interpret numerical information that has a wide range of 
values, we like to reduce our confusion by looking at a single number 
which summarizes the information. For example, when tests are returned 
to a class, students are usually interested in the test average as well as 
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the distribution of grades. In the next example, we will introduce a basic 
concept by looking at a distribution of grades. 


Example 4.13 A large lecture class with 100 students was given a 
10-point quiz. The lowest score actually recorded was a 5. The distribu- 
tion of scores (from 5 to 10) is given in the following table. 


Students are interested in two things: the percentage of students at each 
grade level and the class average. The percentage of students at each 
grade level is given next. 


[Gu | 5 | $ |7 [| 8 | 9 [3 ] 
5% 


Note that we could reinterpret this table as a probability function of a 
random variable X. Suppose a student score X is chosen at random 
from the class. What is the probability p(z) that the student score is x? 
The next table repeats the previous one in probability function format. 


The previous tables show the grade distribution, but people still want to 
know what the “average” is. The word "average" is in quotes here 
because there are different kinds of averages that can be calculated. 
More will be said about this later. The "average" that is most familiar to 
students is the mean, which is calculated by adding up all 100 student 
scores and dividing by 100. We do not really have to add 100 separate 
scores, since we can add 5 scores of 5 by multiplying 5 x 5, add 10 
scores of 6 by multiplying 6 x 10, and so on. The mean is given by 


Class Mean = 5-5+6-10+7-45+8-2049-104 10-10 _ 755, 


This mean can be rewritten in terms of the probabilities for the grade 
random variable by a little rearrangement of numbers. 
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scat EE. . 10. 45 . 20. 3,10: . 10. 


= 5(.05) + 6(.10) + 7(.45) + 8.20) + 9.10) + 10 (.10) 


= See |: px) О 


This example shows that if we аге given numerical results in the 
form of a probability function, we can calculate the familiar mean (or 
average) using the above result. 


Mean = ae р(х) 


When we аге given a discrete random variable X, we are usually given 
only the probability function p(x). The mean of the random variable X 
can be obtained from р(х) by using the simple equation above. 

The mean of the random variable is also called the expected value 
of the random variable. 


Definition 4.4 Let X be a discrete random variable. The expected 
value of X is defined by 


E(X) = yee р(х). 
The expected value of the random variable X is often denoted by the 
Greek letter p (pronounced “тем”). 


Е(Х) = џи 


Example 4.14 The probability function for the random variable іп 
Example 4.5 (number of children in a birth) was as follows: 


[Number of children бү 31 | 2 | 3 ] 


Then the mean is 
u = E(X) = 1(.9670) + 2(.0311) + 3(.0019) = 1.0349. O 
The calculations become more interesting if the discrete random 


variable is infinite. It is necessary to look at another infinite series 
formula before the next example. 
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Series Formula 


The infinite geometric series given by Equation (4.2a) tells us 
that for |z| < 1, 
ос 


Ух®=1+х+єт?+х%+ = ph. (4.2b) 
k=0 


If we differentiate this infinite series term by term, and differen- 
tiate the expression on the right in the usual manner, we see that 
for |x| < 1, 


Sok ch) 31420430? + 45+ = | >. (4.3) 
k=l (1 — z) 


Example 4.15 Let X be the random variable for the number of 
unsuccessful plays before the first win on the slot machine in Examples 
4.7 and 4.11. The probability function is p(k) = P(X = k) = .95*(.05). 
Then 


и = Е(Х) = ум - p(k) = У k(.95*)(.05) 
k=0 k=0 


= 0(.05) + 1(.05)(.95) + 2(.05)(.952) + -.- 
= (.05)(.95)[1 + 2(.95) + 3(.95)? + -..] 


1 393 E 
= coss ( 2 = ss) = 95 = 19. О 

One common way of interpreting this result is to say that the 
average (mean) number of unsuccessful plays before the first win is 19. 
We could also say that the expected number of unsuccessful plays before 
the first win is 19. These verbal interpretations can be misleading. They 
do not say that you should expect to have exactly 19 unsuccessful plays 
and then the first win. Some players win on the first play and some on 
the fortieth. The expected value is not what you “expect” to happen. It is 
an average. 


4.3.0 The Expected Value of Y = aX 


Example 4.16 In Example 4.6 we looked at the probability 
function for the random variable X, the number of claims filed by a 
policyholder in a large insurance company in a year. 
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CNusberofelims e) | 0 [ 1 [ 2 [3 | 


The expected number of claims is 
E(X) = 0(.72) + 1(.22) + 2(.05) + 3(.01) = .35. 


Suppose this table is for a type of policy which guarantees a fixed 
payment of $1000 for each claim. Then the amount paid to a 
policyholder in a year is just $1000 multiplied by the number of claims 
filed. The total claim amount is a new random variable Y = 1000Х. We 
now have two random variables, X and Y, and each random variable has 
its own probability function. To avoid confusion, we will subscript the 
probability function. The probability function for X is p,(x) and the 
probability function for Y is p, (y). The probability function for Y has 
the same second row as the probability function for X, since 
p,(1000z) = p, (z). 


Total claim amount (y) 2000 | 3000 
72| 22 | 05 | o! 


The expected claim amount is 
E(Y) = 0(.72) + 1000(.22) + 2000(.05) + 3000(.01) = $350. О 


Since E(X) = .35, then E(1000.X) = E(Y) = 1000E(X). This 
simple multiplication rule always works. 


For any constant a and random variable X, 
E(aX) =a. E(X). (4.4a) 


The derivation of Equation (4.4a) should be clear from Example 
4.16. ff Y = aX, p(y) = py (ax) = py(z). Then 


E(Y) = E(aX) = Y ax ‘py (az) = as x `Ру(ж) =a- E(X). 


The expected claim amount for the year is often called the pure 
premium for the insurance policy. If the company charges the mean 
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amount of $350 per year for each policy sold, and its experience actually 
follows the assumed probability function, then there will be just enough 
money to pay all claims. This is pursued in Exercises 4-7 and 4-8. 


The useful rule for У = a X can be extended to a rule for aX + b. 


For any constants a and b and random variable X, 


E(aX +b) = a- E(X) +b. (4.4b) 


The derivation of Equation (4.4b) is left as Exercise 4-9. 


Example 4.17 The company in Example 4.16 has a yearly fixed 
cost of $100 per policyholder for administering the insurance policy. 
Thus its total cost in a year for a policy is the sum of the claim payments 
and the administrative cost. 


Total cost per policy — 1000.X 4- 100 
The expected cost per policy per year is 
E(1000X + 100) = 1000 E(X) + 100 = $450. О 


4.3.3 Тһе Моде 


The mean of a random variable is the most widely used single measure 
of central tendency. There are other measures which are also informa- 
tive. One of these, the median or fiftieth percentile, will be covered in 
Chapter 7. The other, the mode, is discussed below. 


Definition 4.5 The mode of a probability function is the value of 
х which has the highest probability р(х). 


Example 4.18 Тһе mode of the probability function for the 
number of claims is z = 0, as the table clearly shows. 


Number of claims (x) ERSEREEZEE 
75 


The mode will be used infrequently in this text. The more widely used 
tools in probability theory rely more on the mean. L1 
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4.4 Variance and Standard Deviation 
4.41 Measuring Variation 


The mean of a random variable gives a nice single summary number to 
measure central tendency. However, two different random variables can 
have the same mean and still be quite different. The next example 
illustrates this. 


Example 4.19 Below we give probability functions representing 
quiz scores for two different classes. 


First class: random variable X 


C weg | 7 | 8 [| 9 
[ m9 ] - | € | 3 - 


Second class: random variable Y 


50) [ $ | 8 [0 | 
[— »p [ 2 | 4 | 39 | 


Each random variable function has a mean of 8. 


E(X) = 7(.20) + 8(.60) + 9(.20) = 8 
E(Y) = 6(.20) + 8(.60) + 10(.20) = 8 


However, the two random variables are clearly quite different. There is 
much more variation or dispersion in Y than in X. The question is how 
to measure that variation. One possible suggestion is to measure 
dispersion by looking at the distance of each individual value x or y 
from the mean of its distribution. This is shown in the tables below. 


First class: random variable for distance from mean, X — 8 


Second class: random variable Y — 8 
6-8- 2 1078-7 
оз» [сө |.» 


y—8 


KEN 
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The expected value of each of the random variables X — 8 and Y – 8 
gives an average distance from the original mean. Unfortunately, this 
average is of no use in measuring dispersion. Positive and negative 
values cancel each other out, and we find E(X — 8) = E(Y — 8) = 0. 
(E(X — и) = 0 for any distribution with и = E(X).) However, if we 
look at the square of the distance from the mean, this problem does not 
occur. 


First class: random variable (X — 8)? 


60 


Second class: random variable (У — 8)? 
(y — 8 (6 – 8)? = 4 (8-8) = 0 (10— 8)2 = 4 
Km ] a T ao T oa 


The expected value of each of these new random variables gives an 
average squared distance from the mean. 


E[(X — 8)?] = 1(.20) + 0(.60) + 1(.20) = 0.4 
E[(Y — 8)2] = 4(.20) + 0(.60) + 4(.20) = 1.6 


This is the single measure of variation that is most widely used in 
probability theory. O 


Definition 4.6 The variance of a random variable X is defined to 
be 


V(X) = BUX — uy] = У (к — py р(х). 


The standard deviation of a random variable is the square root of its 
variance. It is denoted by the greek letter с. 


ос = у У(Х) 
The variance is also written as V(X) = о?. 
If more than one random variable is being studied, subscripts are 


used to associate mean and standard deviation with the proper random 
variable. 
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Example 4.20 For the random variables X and Y in Example 
4.19, we write the following: 


Hx = My = 8 
V(X) = оў = 40 V(Y) = of = 1.6 


Ox = Jj o3, = V/ 40 = 632 бү = y/o} = V1.6 = 1.265 


Note that the random variable Y, which is more dispersed, has a greater 
variance and standard deviation. Г] 


4.4.2 The Variance апа Standard Deviation of Y = aX 


If Y =aX, we already know that Hy = E(Y) = а. E(X)—- a. Hy. 
Recall that if Y = a X, then Py (y) = p (az) = Py (zx). Then 


VQ) = 5 G9 - wy? - py) = Y ax — au, - Py (2) 
= aS (x — ji) `Ру(ш) = d^: V(X). 


This gives us a simple way to find V(Y) = V (aX). 


V(aX) ~ a?- V(X) 


The standard deviation of aX can now be obtained by taking the 
square root. 


Example 4.21 We return to the distributions of claim number and 
claim amount given in Example 4.16. The probability function for claim 
number random variable X was as follows: 


Number of claims (x) 
C @ [aa 


.05 
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We found that E(X) = .35. Using Definition 4.6, V (X) is given by 
c? = E(X — иу] 

.72(0—.35)? + .22(1—.35)2 + .05(2—.35)? + .01(3—.35? 

= .3875. 


с = ү 3875 ~ .622495 


The probability function for the claim amount random variable У was 


[Total claim amount (y) 9 [ 1000 | 2000 | 3000 | 


We previously found E(Y) = 1000(.35) = 350. V(Y) does not have to 
be calculated directly. Instead we write 


V(Y) = V(1000X) = 1000? - V(X) = 1,000,000(.3875) = 387,500. 


The reader can check this result by direct calculation. O 


The useful rule (4.5a) can be extended to handle Y = aX + b. 


V(aX +b) = a2- V(X) (4.5b) 


A derivation of Equation (4.5b) is outlined in Exercise 4-14. The 
intuitive idea is that if all values are shifted by exactly b units, the mean 
changes but the dispersion around the new mean is exactly as before. 


Example 4.22 In Example 4.17 we looked at the total cost ran- 
dom variable Y = 1000X + 100, where X is the claim number random 
variable. In Example 4.20 we showed V (X) = .3875. Then 


V(1000X + 100) = 10007(.3875) = 387,500. o 
4.4.3 Comparing Two Stocks 
Suppose you are considering an investment in one of two stocks, imag- 


inatively named A and B. You have a forecast of the value of the stocks 
in the future. 
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. Forecast: The value of each stock will increase by 5% if the 
national economy stays as it is. If the economic outlook improves, Stock 
A will increase in value by 10% and Stock B will increase in value by 
1596. If the economic outlook deteriorates, Stock A will decrease in 
value by 10% and Stock B will decrease in value by 15%. You believe 
that probabilities for the future states of the economy are given by the 
following table: 


State of the economy | Deteriorate | Unchanged | Improve 
Probability .20 .60 
This information enables you to create probability function tables for the 
return on each of the two stocks. 


96 Change in value of Stock A: a 


% Change in value of Stock B: b 05 | +.15 
Probability: (b) [ 20 |.60| 20 


We cannot use expected value to choose between these stocks, 
since they have the same expected value. 


E(A) = (—.10)(.20) + .05(.60) + .10(.20) = .03 
E(B) = (—.15)(.20) + .05(.60) + .15(.20) = .03 


However, there is a real difference between the two stocks. There 
is much more variation in the return of Stock B than the return of Stock 
A. Modern financial theory says that Stock B is riskier than Stock A 
because of that increased variation. You can make a greater profit with 
B, but you risk a greater loss. 

One number that can be used to measure the risk in a stock is the 
standard deviation of returns. For the stocks above, we can easily 
compute the variances and standard deviations of the random variables 
representing change in value. 


V(A) = (—.10—.03)7(.20) + (.05—.03)*(.60) + (.10—.03)?(.20) = .0046 
V(B) = (—.15—.03)2(.20) + (.05—.03)?(.60) + (.15—.03)?(.20) = .0096 
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Then сд © .068 and og 7 .098. The standard deviation of the riskier 
stock is higher. 

Modern finance texts use the standard deviation of an investment 
as one possible measure of risk.? Many books of investment information 
give the mean and standard deviation of recent historical returns for 
stocks and mutual funds.? 


4.4.4 z-scores; Chebychev's Theorem 


Example 4.23 In Example 4.13, we studied the probability distri- 
bution of grades for a class. 


[Grade() [ 5 | 6 | 7 | 8 | 9 [10] 


The expected value is 7.5. The variance and standard deviation are 


V(X) = .05(—2.5)* + .10(—1.5)? + .45(—0.5y? 

+ .20(0.5)* + .10(1.5)? + .10(2.5 = 1.550 
and 

ox = V 1.55 z 1.245. 
Suppose a student scored 10 on this quiz. The student is 2.5 points above 


the mean of 7.5. However, if we think of variability as measured in 
standard deviation units, those 2.5 points are 


10—7.5_ 25 ir 
1245 = 1545 ^ 2.008 


standard deviation units above the mean. We have just computed a z- 
score. 0 


2 See, for example, page 143 of Bodie et al. [1]. 

3 On page 146 of [1] you will find this information for the entire Standard and Poor's 
Composite index of common stocks, 1926-2002. The mean is 12.04% and the standard 
deviation is 20.55%. 
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Definition 4.7 For any possible value z of a random variable, the 
z-score 1S 
_ 2-3 
ak 


The z-score measures the distance of z from и = E(X) in standard 
deviation units. 


Example 4.24 For the test example above, a student with a score 
of 6 has a z-score of 


„— OH TS д. _ 
z = py548 ^ 1.205. 


That student's score is approximately 1.205 standard deviations below 
the mean. We could say that the student's score of 6 is within 1.21 
standard deviations of the mean, since the score is below the mean by 
less than 1.21 standard deviations. o 


Definition 4.8 We say that a value z of the random variable X is 
within k standard deviations of the mean if |z| € k. 


Example 4.25 In the grade example, the highest z-score is ap- 
proximately 2.008. The lowest z-score is found for x = 5; it is —2.008. 
Thus we could say that all of the z-values are within 2.01 standard 
deviations of the mean. This means that the probability is 1 that a score 
will be within 2.01 standard deviations of the mean. Below we give all 
the values of z with their approximate z-scores and probabilities. 


бай | 3 | $ [7 [so [0] 
2 o |2008 |1205 
Ha) = x3 


The values 6, 7, 8, and 9 are within 1.21 standard deviations of the mean. 
Then 


P(X is within 1.21 standard deviations of the mean) 
= P(6 < X <9) = 104 454+ 20 + .10 = .85. 


For the original data, we could simply say that 85% of the scores are 
within 1.21 standard deviations of the mean. о 
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It is common to discuss the percentage of values of a random 
variable that lie within a certain number of standard deviations of the 
mean. The results can vary widely from one random variable to another. 


Example 4.26 The claim amount distribution in Example 4.22 
had и = 350 and о = \/387,500 ~ 622.495. The probability function 
table with approximate z-scores is as follows: 


Total claim amount (y) 
— ET 1.044 | 2.651 | 4.257 


For this distribution, the probability that X is within 2.01 standard 
deviations of the mean is .94, not 1.00 as in the previous example. [1 


Usually discussions of this type depend on what specific probabili- 
ty function is being studied. However, there is a general result which 
holds for all probability functions. 


Chebychev's Theorem For any random variable X, the probabi- 


lity that X is within k standard deviations of the mean is at least ] — Б. 


P(u -ko < X < + ke) >21 -b5 


Example 4.27 For the grade random variable, the mean was 7.5 
and the standard deviation was approximately 1.245. Chebychev’s 
Theorem says that the probability that a grade is within 3 standard 


deviations of the mean is at least 1 — 5 , or approximately .889. 


P(7.5 — 3245) € X < 7.5 + 3(1.245)) 
= P(3.765 < X < 11.235) > 1— 3 zz .889 


This last result is certainly true. All values of X are between 3.765 and 
11.235, so the exact probability that X is in this range is 1.00. The true 
probability of 1.00 is certainly greater than or equal to .889. L1 


Chebychev's Theorem was quite conservative here: it estimated a 
lower bound of .889 for a probability that was actually 1.00. For the 
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distributions studied in this text, we will calculate exact probabilities for 
problems like this. Chebychev’s Theorem will see very little use. 


4.5 Population and Sample Statistics 
4.5.1 Population and Sample Mean 


Most people are familiar with the calculation of an average or mean for a 
set of numbers, such as the test scores for a class. Modern calculator 
technology makes this calculation easy. However, it takes a little work to 
relate our standard deviation calculations to calculator technology. This 
is required because most calculators have two different standard 
deviation keys — one for a population and one for a sample. The 
difference between a population and a sample can be illustrated by 
returning to our probability function for the number of claims X filed by 
a policyholder with a large insurance company. 


[Number ofdlaims (о [ 0 [1 [213] 
а) ]7]22|9]9| 


This is the probability function for all policyholders of the 
company — the entire population of policyholders. The mean and 
standard deviation were calculated in Examples 4.16 and 4.21 by using 
the probabilities above and the formulas 


w= xs p(x) = 35 


and 


c = \/ў (x — u} - р(х) = 622495. 


Suppose the company had n = 100,000 policyholders and had 
compiled the above table by looking at all records to obtain the follow- 
ing table: 


( Numbrofdaims( [б [ 1 [273 


Number of policyholders with | 72,000 | 22,000 | 5,000 | 1,000 
х claims (f) 
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If we rewrite each р(х) as f/n, the formulas for population mean and 
standard deviation can be rewritten as follows: 


Population Mean and Standard Deviation 


и = LY fez (4.7a) 
c= IYS o - uy 


These formulas essentially add up all 100,000 individual values instead 
of using the probability table. They are equivalent, and give the correct 
answers for the entire population. 

In many cases, it is not possible to gather complete data on an 
entire population. Then people who need information might take a 
sample of records to get an estimate of the mean and standard deviation 
of the population. Suppose an analyst does not know the true values of m 
and с for the entire company population. She picks a sample of n — 10 
policyholder records at random from the company files, and finds the 
following numbers of claims on the 10 records. 


0,0,1,0,2,0,0,0,1,0 


This sample leads to the following frequency table. 


Number of claims (x) 
Number of policyholders with z claims ( f) 


There are now two means and two standard deviations to consider: a) 
the original population mean and standard deviation, which are unknown 
to the analyst, and b) the sample mean and standard deviation. We 


picture this as follows: 
Population 
Unknown 
p.c 


l 


Sample 


Known data; 
can calculate 


mean and 
standard deviation 
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To estimate the true mean and standard deviation, the analyst would 
compute the sample mean and sample standard deviation from the 
sample values using a slightly different set of formulas. The difference is 
that the sum of squares in the standard deviation formula is divided by 
n — 1 instead of n when the calculation is done for sample data. This is 
done to make the estimates come out better on the average*, but the 
details are the subject of another course. The real issue here is that 
calculations using sample data require a new and different formula. 


Sample Mean and Standard Deviation 


в = LY e- 


For the sample data above, 


E = 10(7:0+2.1+1.2) = 40 


s = V 1070—40) + 2(1—.40)? + 1(2-.40)?] ~ .699206. 


These numbers are estimates of p and c; the analyst did not know those 
values (and still does not). A major difference between statistics and 
probability is that the subject of statistics deals primarily with estimating 
unknown values like и and с from sample data, whereas probability 
deals with solving problems for populations with known (or assumed) 
distributions. More will be said about this in later sections. This text 
covers probability and deals very little with estimation from sample data. 
However, it is important for the student to realize that the concepts of 
mean and standard deviation are widely used in two different ways with 
two different sets of formulas. This occasionally leads to confusion in 
calculator use. 


and 


^ The technical term is that the estimators are unbiased. 
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4.5.2 Using Calculators for the Mean and Standard Deviation 


Modem calculators typically give both the sample and population stan- 
dard deviations. Thus the student must be familiar with both and be able 
to determine which one is required for any given problem. 

The TI-83 calculator calculates both sample and population stan- 
dard deviation. On this calculator, the values of x and the frequencies f 
are entered in separate lists, say, Lı and L5. Then the command 


] — Var Stats Li, Lo 


will lead to a screen which shows the mean as т, sample standard devia- 
tion as s,, and population standard deviation as ту. 

The TI BA II Plus calculator has a STAT menu. Under the 1-V 
option the calculator will show the mean as z, sample standard deviation 
as sy, and population standard deviation as c, just as the TI-83 does. 

In Microsoft EXCEL? the function AVERAGE gives the mean, 
the function STDEV gives the sample standard deviation and the 
function STDEVP gives the population standard deviation. 


4.6 Exercises 


4.2 The Probability Function of a Discrete Random 
Variable 


4-1. Let X be the random variable for the number of heads obtained 
when three fair coins are tossed. What is the probability function 
for X? 


4-2. Ten cards are face down in a row on a table. Exactly one of them 
is an ace. You turn the cards over one at a time, moving from 
left to right. Let X be the random variable for the number of 
cards turned before the ace is turned over. What is the 
probability function for X? 


4-3. А fair die is rolled repeatedly. Let X be the random variable for 
the number of times the die is rolled before a six appears. What 
are the probability function and the cumulative distribution func- 
tion for X? 
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4-4. 


4.3 


4-5. 


4-6. 


4-7. 


Let X be the random variable for the sum obtained by rolling 
two fair dice. What are the p(x) and F(x) functions for X? 


Measuring Central Tendency; Expected Value 
For the X defined in Exercise 4-4, what is E(X)? 


The GPA (grade point average) random variable X assigns to 
the letter grades A, B, C, D and E the numerical values 4, 3, 2, 1 
and 0. Find the expected value of X for a student selected at 
random from a class in which there were 15 A grades, 33 B 
grades, 51 C grades, 6 D grades, and 3 E grades. (This expected 
value can be thought of as the class average GPA for the 
course.) 


A construction company whose workers are used on high-risk 
projects insures its workers against injury or death on the job. 
One unit of insurance for an employee pays $1,000 for an injury 
and $10,000 for death. Studies have shown that in a year 7.396 
of the workers suffer an injury and 0.41% are killed. What is the 
expected unit claim amount (pure premium) for this insurance? 
If the company has 10,000 employees and exactly 7.396 are 
injured and exactly 0.4196 are killed, what is the average cost 
per unit of the insurance claims? 


Suppose that in the above problem the administrative costs are 
$50 per person insured. The company purchases 10 units of 
insurance for each worker. Let X be the total of expected claim 
amount and administrative costs for each worker. Find E(X). 


Verify Equation (4.4b). 


Let X be the random variable for the number of times a fair die 
is tossed before a six appears (Exercise 4-3). Find E(X). 


The mode of a probability function does not have to be unique. 
Find the mode of the probability function in Exercise 4-1, for the 
random variable for the number of heads obtained when three 
fair coins are tossed. 
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4.4 


4-12. 


4-13. 


4-14. 


4-15. 


4.5 


4-16. 


4-17. 
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Variance and Standard Deviation 


If X is the random variable for the sum obtained by rolling two 
fair dice (Exercise 4-4), what 15 V(X)? 


For the insurance policy that pays $1,000 for an injury and 
$10,000 for death (Exercise 4-7), what is the standard deviation 
for the claim amount on 5 units of insurance? (Note: Some 
employees receive $0 of claim payment. This value of the 
random variable must be included in your calculation.) 


Verify Equation (4.5b). (Hint: It is sufficient to show that 
V(X +b) = V(X). If Y =X +b and E(X) = py, what is 
y= Hy?) 


Let X be the random variable for the sum obtained by rolling 

two fair dice (Exercise 4-4). 

(a) Using Chebychev’s Theorem, what is a lower bound for 
the probability that the value of X is within 2 standard 
deviations of the mean of X? 

(b) What is the exact probability that this sum is within this 
range? 


Population and Sample Statistics 


An auto insurance company has 15,000 policyholders with 
comprehensive automobile coverage. In the past year 11,425 
filed no claims, 3,100 filed one claim, 385 filed two claims, and 
90 filed three claims. What are the mean and the standard 
deviation for the number of claims filed by a policyholder? 


A marketing company polled 50 people at a mall about the 
number of movies they had seen in the previous month. The 
results of this poll are as follows: 


Number of tr movies 


JEE 
Ribe оғенез |315 619 117131317. 


What аге the sample mean and sample standard deviation for the 
number of movies seen by an individual in a month? 
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4.7 


4-18. 


4-19. 


4-20. 


Sample Actuarial Examination Problems 


A probability distribution of the claim sizes for an auto insurance 
policy is given in the table below: 


Claim Size 
2 
з 
4 


What percentage of the claims are within one standard deviation of 
the mean claim size? 


0 

0 

0 
50 
70 


A recent study indicates that the annual cost of maintaining and 
repairing a car in a town in Ontario averages 200 with a variance 
of 260. 


If a tax of 2096 is introduced on all items associated with the 
maintenance and repair of cars (i.e., everything is made 20% more 
expensive), what will be the variance of the annual cost of 
maintaining and repairing a car? 


A tour operator has a bus that can accommodate 20 tourists. The 
operator knows that tourists may not show up, so he sells 21 
tickets. The probability that an individual tourist will not show up 
is 0.02, independent of all other tourists. 


Each ticket costs 50, and is non-refundable if a tourist fails to 
show up. If a tourist shows up and a seat is not available, the tour 


operator has to pay 100 (ticket cost + 50 penalty) to the tourist. 


What is the expected revenue of the tour operator? 


Chapter 5 
Commonly Used Discrete 
Distributions 


In Chapter 4 we saw a number of examples of discrete probability 
distributions. In this chapter we will study some special distributions that 
are extremely useful and widely applied. Examples of some of these 
distributions have already appeared in Chapter 4. 


5.1 The Binomial Distribution 


We have already seen an example of a binomial distribution problem: 
tossing a coin three times and finding the probability of observing 
exactly two heads. The binomial distribution is useful for modeling 
problems in which you need to find probabilities for the number of 
successes 1n a series of independent trials; how many times will you toss 
a head, hit a target, or guess a right answer on a test. We will introduce 
the binomial distribution by looking at the coin-tossing example. 


5.1.1 Binomial Random Variables 


Suppose you are going to toss a fair coin three times and record the num- 
ber of heads X. The process of tossing the coin three times and 
observing whether or not each toss is a head is called a binomial 
experiment because it satisfies all the conditions given in the following 
definition. 
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Definition 5.1 An experiment is called a binomial experiment if 
all of the following hold: 


(a) The experiment consists of n identical trials. 

(b) Each trial has exactly two outcomes, which are usually 
referred to as success (5) or failure (F). 

(c) The probability of success on each individual trial is always 
the same number P(S) — p. (The probability of failure is 
then always P(F) = 1 — p. It is traditional to use the nota- 
tion P(F') = 9 = 1 ~ р.) 

(d) The trials are independent. 


Definition 5.2 1f X is the number of successes in a binomial 
experiment, X is called a binomial random variable. 


Example 5.1 А fair coin is tossed three times and the number of 
heads X is recorded. The experiment is a binomial experiment since all 
of the following hold: 


(a) There are n = 3 identical trials (coin tosses). 

(b) Each trial has two outcomes: heads (a success, 5) or tails (a 
failure, F). 

(c) The probability of success is the same on each trial; in this 
case, P(S) — P(H) — .50 for each toss. 

(d) Successive tosses of a fair coin are independent. 


Thus X is a binomial random variable. Li 


Example 5.2 A student takes a multiple choice examination with 
n — 10 questions. He has not attended class or studied for three weeks 
and plans to guess on each question by having his calculator display a 
random integer from 1 to 5. (There are 5 choices for each question.) Let 
X be the number of questions out of 10 for which the student guesses 
correctly. Then X is a binomial random variable, since all of the 
following hold: 


(a)  Thereare n — 10 identical trials. 

(b) Each trial has two outcomes: right (a success, S) or wrong. 
(с) PCS) = p= 1/5 = .20 on each trial. 

(d) Successive guesses are independent. m 
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5.1.2 Binomial Probabilities 


In Section 3.4.2 we used the multiplication rule for independent events 
to show that the probability of tossing 3 heads in a row with a fair coin 
was 1/8. That was an example of a binomial probability problem — we 
found the probability P(X = 3) for the binomial random variable X in 
Example 5.1. There is a formula which will enable us to find P(X = k) 
for any binomial random variable X and any k. We will show how this 
formula works by looking at the example of tossing a fair coin 3 times. 


Example 5.3 Below is the tree for three tosses of a fair coin. 
Probabilities for each branch are included. 


Outcome Probability 
HHH 1/8 


1/8 


1/8 


Let X be the number of heads observed. There is only one branch 
(HHH) with X = 3. Since the probability of each branch is 1/8, 


P(X = 3) = (number of branches with 3 heads) È = (4) = l. 
This reasoning works for any possible value of X. For example 


P(X = 2) = (number of branches with 2 heads)d = 3 (1) = 2. m 
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The above results above could also have been obtained from the general 
formula for P(X = k). 


Binomial Distribution 


If X is a binomial random variable with n trials and P(S) = р, 


Р(Х =k) = (K) -o = (K)o, (5.1) 


fork =0,1,...,n. 


Example 5.4 Let X be the number of heads in 3 tosses of a fair 
coin. Then n = 3 and p= 1. Using Equation (5.1) for k = 2, we сап 
replicate the value of P(X = 2) obtained in the last example. 


pcx = 2)= (%)(3) (2) =2(8) = i 
Note that the term (3) gives the number of branches with exactly 2 


2 І 
heads, апа the term (5) (2) gives the probability of a single branch 


with 2 heads. o 


The example should make clear the meaning of the terms in 
Equation (5.1). 


(1) p*q™-* gives the probability of a single branch with exactly 
k successes. 
(2) (8) gives the number of branches with exactly k successes. 


Example 5.5 We return to the student who is guessing on a ten- 
question multiple choice quiz, with n = 10 and p = .20. The probability 
that the student gets exactly 2 questions right is 


( 9) (20)°(.80)8 ~ .30199. 


The probability that the student who guessed on all 10 questions got only 
2 right answers is approximately .302. There is some justice in this. О 
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5.1.3 Mean and Variance of the Binomial Distribution 


The mean and variance of a binomial distribution depend on the under- 
lying values of n and p. It is not too hard to find the mean and variance 
for a binomial distribution when there is only one trial — i.e., with 
n = 1. The probability distribution for a binomial random variable with 
n = land P(S) = pis given below. 


Number of successes (x) 0 
p(z) tep drop 


E(XX)=q:-0+p-l=p 


V(X) = E((X — př] = q(0—pY + p-p}? 
= (р)? + p(qY = pq(p + 4) = pq 


Exercise 5-10 asks the reader to show that for a binomial random 
variable X with n — 2 and P(S) — p, 


E(X) — 2p 
and 
V(X) = 2рд. 


The general formulas for the mean and variance of any binomial 
distribution X follow the pattern established above. Methods for proving 
these rules in general will be developed later in the text. 


Binomial Distribution Mean and Variance 
If X is a binomial random variable with n trials and P(S) = p, 


E(X) = пр (5.2a) 
and 


V(X) = np(l — p) = npg. (5.2b) 


Example 5.6 Let X be the number of heads in 3 tosses of a fair 
coin. Since X is binomial with n = 3 and p = .50, 


E(X) = 3(.50) = 1.5 and У(Х) = 3(.50)(1 — .50) = .75. О 
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Example 5.7 Let X be the number of correct answers for a 
student guessing on a 10 question (n = 10) multiple choice test with 5 
choices on each question (p = .20). 


E(X) = 10(.20) = 2 V(X) = 10(.20)(.80) = 1.6 oO 


Technology Note 


We have already noted that calculators like the TI-83 or TI-BA II 
Plus will calculate the coefficient (£) needed for the binomial proba- 
bility formula. Thus it is fairly easy to calculate binomial probabilities 
on these calculators. Since the binomial distribution is widely used, 
many calculators and computer packages have special functions for 
finding binomial probabilities. On the TI-83, entering 


binompdf(10, .20, 2) 


gives the probability of .30199 found in Example 5.5. (The function 
binompdf( ) can be found in the DISTR menu.) 

Microsoft? EXCEL has a function BINOMDIST which finds 
binomial probabilities. The statistical package MINITAB will quickly 
give the entire probability distribution for a binomial random variable X. 
Below is the entire probability distribution for the binomial random 
variable X with n — 10 and p — .20, as calculated by MINITAB. 


Binomial (10, .20) 
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The last two probabilities in the MINITAB printout are not 0; they 
round to 0 when four decimal places are used. The computer-generated 
table can be used to rapidly answer questions about the binomial experi- 
ment 


Example 5.8 Consider the guessing student with n = 10 and 
p = .20. What is the probability that he has 6 or more correct answers? 


P(X > 6) = .0055 + .0008 + .0001 + .0000 + .0000 = .0064 


The guessing student will score 60% or more on this quiz less than 1% 
of the time. [1 


5.1.4 Applications 


Example 5.9 (Insurance) The 1979-81 United States Life Table 
given in Bowers et al. [2] gives the probability of death within one year 
for a 57-year-old person as .01059. (In actuarial notation, qs; — .01059.) 
Suppose that you are an insurance agent with 10 clients who have just 
reached age 57. You are willing to assume that deaths of the clients are 
independent events. 

(a) | What is the probability that all 10 survive the next year? 

(b) What is the probability that 9 will survive and exactly one 

will die during the next year? 


Solution If client deaths are independent, the number of survivors 
X will be a binomial random variable with parameters n = 10 and 
р = 1 — .01059 = .98941. 


(a) P(X = 10) = (10) сэвэат)" ~ 89901 


b) P(X =9)= (19 )сэвэа1)9(01059)! ~ 09622 o 


Example 5.10 (Polling) Suppose you live in a large city which 
has 1,000,000 registered voters. The voters will vote on a bond issue in 
the next month, and you want to estimate the percent of the voters who 
favor the issue. You cannot ask each of 1,000,000 people for his or her 
opinion, so you decide to randomly select a sample of 100 voters and ask 
each of them if they favor the issue. What are your chances of getting 
reasonably close to the true percentage in favor of the issue? 


120 Chapter 5 


Solution To answer this question concretely, we will make an 
assumption. Suppose the true percent of the voters who favor the bond 
issue 15 65%. You don’t know this number; you are trying to estimate it. 
In polling voters, you are really doing a binomial experiment. A success 
S is finding a voter in favor of the bond issue, and P(S) = p = .65. You 
are polling 100 voters, so n — 100. Your random selection is designed to 
make the successive voter opinions independent. Below is a table of 
probabilities p(x) and cumulative probabilities F(x) for values of x from 
59 to 70. 


59 


60 | 0.0474 
61 | 0.0577 
62 | 0.0674 
63 | 0.0755 
64 | 0.0811 
65 | 0.0834 


66 
67 
68 
69 
70 


0.0821 
0.0774 
0.0698 
0.0601 
0.0494 


The probability that 65 out of the 100 voters sampled favor the bond 
issue is .0834, so that you will estimate the true percentage of 65% 
exactly with a probability of .0834. The probability that your estimate is 
in the range 60% — 70% is the sum of all the p(x) values above, since it 
equals 


P(60 < X < 70) = p(60) + p(61) + --- + p(70). 


The cumulative distribution function F(x) helps to simplify this calcula- 
tion, since 


P(60 < X < 70) = P(X < 70) — P(X < 59) = .8764 — .1250 = .7514 
to four places.! Even though you do not know the true value of p = .65, 


your estimate will be in the range .60 to .70 with probability .7514. L1 


| The p(x) values add to .7513 due to rounding. 
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Polling problems are really statistical estimation problems. A 
statistics course would demonstrate how to increase sample size to give 
an even higher probability of getting an estimate very close to the true 
value of p. However, the statistical methods taught in other classes are 
based on the kind of reasoning used in the last example. 


5.1.5 Checking Assumptions for Binomial Problems 


There are some applied problems in textbooks in which independence of 
trials is questionable. A standard example is the following problem: 


A baseball player has a batting average of .350.? What is the 
probability that he gets exactly 4 hits in his next 10 at bats? 


This problem usually appears at the end of the section on binomial 
probabilities. The obvious intent is to treat the next 10 at bats as n = 10 
independent trials with p = .350 on each trial. Many students question 
this problem, either because they do not believe that successive at bats 
are independent or they do not believe that p = .350 on each trial. (The 
authors also question these assumptions.) The best way to simplify this 
situation for the student is simply to add a clause to the problem: 


Assume that successive at bats are independent and the same 
value of p applies in each at bat. 


The polling problem in Example 5.10 also raises issues about the 
validity of assumptions. The usual method of sampling voters is called 
sampling without replacement. Once you have polled a specific voter, 
you will not sample him or her again. This means that when the first 
voter is selected for polling, the next selection will not be from all 
1,000,000 voters, but from the remaining 999,999. This changes the 
probability of favoring the bond issue very slightly for the second trial. 
The usual response to this problem is to say that with 1,000,000 voters 
and a sample of only 100, the removal of a few voters changes things 
very little on each trial, and it is still reasonable to use the binomial 
probability model. This practical argument depends heavily on the 
underlying population being very large and the sample very small in 
comparison. In the next section we will introduce the hypergeometric 
distribution, which will handle sampling without replacement exactly for 
any population size. 


2 This often gives textbook authors a chance to put in their favorite hitters, so that the 
problem becomes the Ted Williams problem or the Tony Gwynn problem. 
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5.2 Тһе Hypergeometric Distribution 
5.2.1 An Example 


We have already solved counting problems that were truly sampling 
without replacement problems in Chapter 3. The first of these problems 
was in Example 3.6, which is reviewed below. 


Example 5.11 In Example 3.6, we looked at a company with 20 
male employees and 30 female employees. The company is going to 
choose 5 employees at random for drug testing. We found, for example, 
that the probability of choosing a group of 3 males and 2 females is 


(2) (2) 
3 2 495,900 . 234. 


13 = 2,118,760 


5 


The numerator in the above expression is the product of (a) the 
number of ways to choose 3 males from 20, and (b) the number of ways 
to choose 2 females from 30. The denominator represents the number of 
ways to choose a random sample of 5 from 50 people. 

It is easy to follow the reasoning in this calculation and find the 
probability that the group selected for testing contains any number of 
females between 0 and 5. If X is the number of females selected, then 


20) (6) 
Pa 24e BON = 0,1,2,3,4,5. 
;) 


The probability function for X is given in the following table: 


Number of females x 
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The problem of selecting five employees for testing is a sampling with- 
out replacement problem. Once a person is selected for a drug test, that 
person is no longer in the pool for future selection. This makes 
successive selections dependent on what has gone before. Originally the 
pool of employees is 40% male and 60% female. If a male is selected on 
the first pick, the remaining pool consists of 49 people. The proportion 
of males changes to 19/49 zz .388 and the proportion of females changes 
to 30/49 = .612. п 


5.2.2 The Hypergeometric Distribution 


The probability function given for the number of females selected in 
Example 5.11 is hypergeometric. A useful intuitive interpretation of the 
hypergeometric distribution can be obtained from Example 5.11. 


(1) А sample of size n is being taken from a finite population of 
size N. In Example 5.11, N = 50 (the number of employees 
in the entire company) and n — 5 (the size of the group 
selected for testing). 

(2) The population has a subgroup of size r > n that is of 
interest. In our problem, there were r — 30 females in the 
population of 50. We were interested in the number of 
females in the group selected for testing. 

(3) The random variable of interest is X, the number of 
members of the subgroup in the sample taken. In Example 
5.11, X is the number of females in the group selected for 
testing. 

(4) The probability function for X is given below. 


Hypergeometric Distribution 


N= 
кый p 


"—- — 


(5.33 


3 All applications here will satisfy т > n and this is the most common situation. If we 
do not require г > п, the formula will still be applicable, with k ranging from 
maz(0,n +r — № 0 min(r, п). 
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| A common textbook example of the hypergeometric distribution 
involves testing for defective parts. This was covered in Example 3.7, 
and is reviewed here. 


Example 5.12 A manufacturer receives a shipment of 50 parts. 20 
of the parts are defective. The manufacturer does not know this number, 
and is going to test a sample of 5 parts chosen at random from the 
shipment. 

Solution In this problem there is a population of N = 50 parts. A 
sample of size n — 5 will be taken. The manufacturer would like to 
study the subgroup of defective parts, and this subgroup has r — 20 
members. The random variable of interest is X, the number of defective 
parts in the sample of size 5. The probability function for X is 


GG) 

P(X =k)= k ANE ,k = 0,1,2,3,4,5. О 
(3) 

5.2.3 The Mean and Variance of the Hypergeometric Distribution 


The mean and variance of the hypergeometric distribution are given 
without proof by the following: 


Hypergeometric Distribution Mean and Variance 


Е(Х) = п (&) (5.4a) 


V(X) = п(5) (1 - $) (3 ) (5.4b) 


An example will enable us to relate this to the binomial distribution 
mean and variance. 


Example 5.13 We return to the parts testing of Example 5.12. A 
sample of size n = 5 was taken from a population of size N = 50 which 
contained r = 20 defectives. If X is the number of defectives, the mean 
number of defectives in a sample is 


Е(Х) = 5(20) = 5(.40) = 2. 
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In this problem, we are conducting n = 5 trials in which a success 
S occurs if and when we find a defective part. On the first trial, 
P(S) = 20/50 = 40 = p. Since parts are not replaced, P(S) =p 
changes on later trials, but the mean is still np = 5(.40) as in the binomial 
case. 

A similar relationship appears when we find the variance of the 
number of defective parts in the sample. 


Vous s(28) (1 20) (f= i) = 5(.40)(.60)43 « 1.102. 


A binomial distribution with n — 5 would have a variance of 
npq = 5(.40)(.60) = 1.20. The hypergeometric variance is adjusted by 
multiplying 1.20 by 45/49. The final term in the hypergeometric variance 
is often called the finite population correction factor. 


5.2.4 Relating the Binomial and Hypergeometric Distributions 


Both the binomial and hypergeometric distributions can be thought of as 
involving n success-failure trials. In binomial problems, successive trials 
are independent and have the same success probability. In hyper- 
geometric problems, successive trials are influenced by what has 
happened before and the success probability changes. When the 
population is large and the sample is small, the hypergeometric 
distribution looks much like the binomial. Meyer [10] states that “In 
general, the approximation of the hypergeometric distribution by the 
binomialis very good if n/N < .10."^ In our Example 5.13, we found 


Binomial Hypergeometric 

n 5 Sample size (r) 5 

p 0.6 Population size (№) 50 
Subgroup size (n) 30 

E р(х) Successes in sample (x) р(х) 

0 0.0102 0 0.0073 

1 0.0768 1 0.0686 

2 0.2304 2 0.2341 

3 0.3456 3 0.3641 

4 0.2592 4 0.2587 

5 00778 | 5 0.0673 


^ See page 176. 
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n/N = 5/50 = .10. For the reader’s comparison, the probability tables 
for the hypergeometric distribution with N = 50, n = 5 and r = 30, and 
for the binomial with n = 5 and p = .60, are shown at the bottom of 
page 117. 


Technology Note 


The formulas for hypergeometric probabilities use the combina- 
torial coefficients (Z) = C(n,k) and can easily be calculated on 
modern calculators. Microsoft? EXCEL has a spreadsheet function 
HYPGEOMDIST which calculates hypergeometric probabilities 
directly. The comparison table on the previous page is an EXCEL 
spreadsheet. 


5.3 The Poisson Distribution 


In the last two sections, we have used the binomial distribution and the 
hypergeometric distribution to find the probability of a given number of 
successes in a series of trials — e.g., the number of heads in 3 coin 
tosses or the number of females selected for drug testing. In this section, 
we will study the Poisson distribution, which is also used to find the 
probability of a number of occurrences — e.g., the number of accidents 
at an intersection in a week or the number of claims an insured files with 
a company in a year. We will first look at the example of the number of 
accidents at an intersection to get an idea of the kind of problems that 
are modeled by the Poisson distribution. 


5.3.1 The Poisson Distribution 


Example 5.14 A busy intersection is the scene of many traffic 
accidents. An analyst studies data on the accidents and concludes that 
accidents occur there at “an average rate of А = 2 per month". This does 
not mean that there are exactly 2 accidents in each month. In any given 
month there may be any number of accidents, k — 0,1,2,3,.... The 
number of accidents X in a month is a random variable. The Poisson 
distribution can be used to find the probabilities Р(Х = k) in terms of 
k and A, the average rate. 
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Poisson Distribution 


The random variable X follows the Poisson distribution with 
parameter (or average rate) А if 


-А yk 
Р(Х = Ю) = 6-4, k = 0,1,2,3,.... (5.5а) 


k! 
For this distribution, 


E(X) =A (5.51)? 
апа 


V(X) =). (5.5c)? 


The number of accidents in a month at this intersection can be 
modeled using the Poisson distribution with an average rate of \ = 2 if 
we make a few reasonable assumptions about how accidents occur. We 
will discuss why the Poisson distribution works well for this problem 
later in this section and again in Chapter 8. Once we accept that the 
Poisson distribution is the right one to use here, it is a simple matter to 
calculate probabilities, mean and variance. If X is the number of acci- 
dents in a month, then 


e 299 
Р(Х = 0) = “or m .1353353, 


2 


P(X = 1) = Se 


E .2706706, 


Q 


eg 757 
Р(Х = 2) = 51 .2706706, 


E(X)—-2 and V(X)-2. 
It should not be too surprising that the mean of X is 2, since 2 was given 


as the average rate of accidents per month. О 


The Poisson distribution is used to model a wide variety of 
situations in which some event (such as an accident) is said to occur at 
an average rate A per time period. 


5 A derivation of E(X) = A will be provided in Section 5.3.4. The proof that V(X) = A 
is outlined in Exercise 5-22. 
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Example 5.15 The holders of an insurance policy file claims at an 
average rate of 0.45 per year. Use the Poisson model to answer the 
following questions. 

(a) Find the probability that a policyholder files at least one 

claim in a year. 

(b) Find the mean number of claims per policyholder per year. 

(c) Suppose each claim pays exactly $1000. Find the mean 

claim amount for a policyholder in a year. (This is the pure 
premium for the policy.) 

Solution 

(a) Let X be the number of claims. 


P(at least one claim) = 1 — P( no claims) 


=1- Р(Х = 0) 


—45 450 
= 1 — 2742 я 3624 
(b E(X) = А = 45 claims per client per year. 


(c) The annual claim amount random variable is Y = 1000X. 
Equation (4.4a) states that E(aX) = a- E(X). Thus the 
pure premium 15 


E(Y) = Е(1000Х) = 1000E(X) = 1000(.45) = 450. O 


5.3.2 Тһе Poisson Approximation to the Binomial for Large 
n and Small p 


With two reasonable assumptions we can demonstrate why the Poisson 
distribution gives realistic answers for the probabilities in Example 5.14: 


Assumption 1 The probability of exactly one accident in a small 
time interval of length t is approximately Xt. For example, if a month 
consists of 30 days, the month will have 30(24) = 720 hours so that an 
hour is a time interval of length t = 1/720 of a month. If the rate of 
accidents is А = 2 per month, the probability of an accident in a single 
hour is At = 2/720 (or 2 accidents per 720 hours). 
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Assumption 2 Accidents occur independently in time intervals 
which do not intersect. 


With these two assumptions, we can find the probability of any 
given number of accidents in a month using the binomial distribution. 
Divide the month into 720 distinct hours which do not intersect. In each 
hour, the probability of an accident is p = 2/720. Since accidents occur 
independently in these 720 hours, we can think of observing accidents 
over a month as a binomial experiment with n = 720 trials and 
p = 2/720. Let X be the number of accidents in a month. Using the 
binomial distribution 


P(X = 1) = (729) (25) (1- Bg)” = 2706702 


In Example 5.14 we found P(X = 1) to be .2706706 using the 
Poisson formula. The binomial calculation gives the same answer as the 
Poisson, to 5 places, for P(X = 1). 

This relationship between Poisson and binomial probabilities is no 
accident. The binomial distribution with n = 720 and p = 2/720 is very 
closely approximated by the Poisson distribution with A = 2. In the 
following table we give probability values for (a) the binomial distribu- 
tion with n = 720 and p = 2/720, and (b) the Poisson distribution with 


А = 2 for z = 0,1,..., 10. The values are very close. 
Poisson Binomial 
A=2 n = 720 
p = 2/720 


© со е Nn Бо м ~ C IN 


© \© соо ы с t à U t ~ о іч 


А 
Ф 
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Thus we can think of the Poisson probabilities for an average rate 
of 2 accidents per month as approximately binomial probabilities for 
n = 720 hourly trials per month, with a probability of p = 2/720 for one 
accident in an hour. In general, the Poisson probabilities for any rate A 
approximate binomial probabilities for large n and small p = A/n. 


Poisson Approximation to the Binomial 
If n is large and p = à is small, then Р(Х = k) can be cal- 


culated using the Poisson or the binomial with approximately the 
same answer. 


dat k k 
ege x à (5.6) 


We will give some idea of why this is true in the next section. 


Example 5.16 In Example 5.15 we looked at an insurance com- 
pany whose clients file claims at an average rate of А = .45 per year. 
The company has 500 clients. What is the probability that a client files 
exactly one claim? 

Solution Let X be the number of claims filed. If we use the 
Poisson distribution, 


—.45 1 
Р(Х = 1) = > « 2869. 


If we are willing to assume that the 500 clients are independent, we can 
look at X as the number of successes in 500 trials with n = 500 and 
p = .45/500. Then 


P(X =1)= eee aC 2 4 м 2871. O 


5.3.3 Why Poisson Probabilities Approximate 
Binomial Probabilities 


To understand the Poisson approximation to the binomial, we need to 
review the definition of the number e and the implied value of e~>. 


e — lim (+4) e^ = lim (6-2) 


n-00 n—-oo 
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This means that for large n, 


To see how this identity can be used to establish the approxima- 
tion, we will look at the simplest cases — i.e., P(X = 0) and P(X = 1). 
For X = 0, the Poisson gives 


Р(Х =0)=e>. 


The binomial with large n and p = A/n gives 
ү AV? лү” 
нато = DAVU- 05 
For Х = 1, the Poisson gives 


P(X = 1) = e^. 


s> 


n 
Jose 


The binomial with large n and p — A/n gives 


since (1 — à) zl. 


The general proof of the approximation is based on the same principles, 
but requires much more rearranging of terms. 


5.3.4 Derivation of the Expected Value of a 
Poisson Random Variable 


In order to prove that E(X) = A for a Poisson distribution with rate A, 
we need to review the series expansion for e”: 


п 
е =1+2+ $ ++ +t 
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The expected value of X is also an infinite series. 


E(X) = Sok -P(X =k) 


—A40 -Ау1 —A42 —AÀ43 
^ n А А 
О te oe 3 n 


2 3 
exea) = Ag e EA 


Technology Note 


The Poisson formulas are simple to evaluate on any modern 
calculator. However, the distribution is used so often that the TI-83 
calculator has a time-saving function (poissonpdf) which calculates 
Poisson probabilities. For example, if À — 2, entering 


poissonpdf(2, 1) 


from the DISTR menu gives .27067 = P(X = 1). 

Microsoft? EXCEL has a POISSON function to calculate Poisson 
probabilities, and MINITAB will generate tables of Poisson probabili- 
ties. The table which compared Poisson and binomial probabilities in 
Section 5.3.2 was calculated in both EXCEL and in MINITAB. 


5.4 The Geometric Distribution 


5.4.1 Waiting Time Problems 


The geometric distribution is used to study how many failures will 
occur before the first success in a series of independent trials. We have 
already looked at a geometric distribution problem in Example 4.7. This 
example dealt with a slot machine for which the probability of winning 
on an individual play was .05 and successive plays were independent. 
The random variable of interest was X, the number of unsuccessful 
plays before the first win. This is a waiting time random variable — it 
represents the number of losses we must wait through before our first 
win. 
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The general setting for a geometric distribution problem has many 
features in common with a binomial distribution problem: 


(1) The experiment consists of repeating identical success-or- 
failure trials until the first success occurs. 

(2) The trials are independent. 

(3) Oneachtrial P(S) = pand P(F) = 1-р = q. 

(4) The random variable of interest is X, the number of failures 
before the first success. 


The probability of k failures before the first success can be found 
by the multiplication rule for independent events: 


Geometric Distribution 


Р(Х = Е) = дїр, k = 0,1,2,3, ... (5.7) 


Example 5.17 Let X Бе the number of unsuccessful plays before 
the first win on the slot machine in Example 4.7. X follows the 
geometric distribution with р = .05 and q = .95. Then 


Р(Х = k) 295 C05) k = 0, 1,2, 3,.... 
This was derived in Example 4.7 using the multiplication rule. m 


Example 5.18 A telemarketer makes repeated calls to persons on 
a computer generated list. The probability of making a sale on any 
individual call is p — .10. Successive calls are independent. Let X be the 
number of unsuccessful calls before the first sale. Then X has a 
geometric distribution with 


P(X = Е) = 90*(.10), k = 0,1,2,3, .... D 


Example 5.195 An unemployed worker goes out to look for a job 
every day. The probability of finding a job on any single day is A. Let X 
be the number of days of job search before the worker finds a job. If we 
assume that successive days are independent, then 


P(X = 0) = (=A), k20,1,2, 3,.... o 


é This example is taken from London [9]. 
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5.4.2 The Mean and Variance of the Geometric Distribution 


The mean and variance of the geometric distribution are given below. 


Geometric Distribution Mean and Variance 


Е(Х) = 


q 

p 

9. 
2 


V(X) = 
р 


Example 5.20 Let X be the number of unsuccessful plays on the 
slot machine in Example 5.17. 


-4-23 
а) Е(Х) = $ = 32-19 
(b У(Х) = 4 = -23 — 380 m 
The expected value of 19 in the last example was previously 
derived in Example 4.15 using Equation (4.3). We can follow the steps 


of Example 4.15 to derive the general expression for the mean of a geo- 
metric random variable X with P(S) = p. 


E(X) = 0q + 1pq + 2pq? + 3pg? +- + Крф +- 
pq(1 + 24 + 34 + Ag? 4 + ka! + o) 


psc 
m(t) -1 Š 


We will show how to derive the expression for V(X) in a later 
section. 


5.4.3 An Alternate Formulation of the Geometric Distribution 


We defined the geometric random variable X to be the number of 
failures before the first success. Other texts define the geometric random 
variable to be Y, the total number of trials needed to obtain the first 
success — including the trial on which the success occurs. This implies 
that Y = X + 1, and changes things slightly. 
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Our text Р(Х =k)=q*p k=0,1,2,3,... 


Alternative P(Y =k) = Р(Х +1 = №) 
= Р(Х = 6-1) 
= qp | ea ae 


When the alternative form is used, the expression for the mean changes 
slightly and the expression for the variance remains the same. We can 
show this using the relationships E(a.X 4- b) = а: E(X)-- b. апа 
V(aX +b) = а? - V(X). 


= ug МЫ, geo 
E(Y)- EX +1) = EQ) 412 2212 P513 = 1 


V(Y)2V(X4-)2V(X)- 2 


Our use of X as the geometric random variable is consistent with 
Bowers et al. [2]. The reader needs to exercise care in problems to be 
sure that X is not mistaken for Y or vice versa. 


Example 5.21 The telemarketer in Example 5.18 makes succes- 
_ sive independent calls with success probability p = .10. The calls cost 
$0.50 each. What is the expected cost of obtaining the first success 
(sale)? 

Solution The total number of calls needed to obtain the first sale 
includes the call on which the sale is made. Thus Y = X + 1 is the 
number of calls to make the first sale, and .50Y is the cost of the first 
sale. 


EC50Y) = 50E(Y) = .50 E(X +1) 
= .50[Е(Х) + 1] 
= 50(90 30 + 1) 


= $5.00 O 
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Technology Note 
The TI-83 calculator has a function 
geometpdf(p, x) 


for which p is the probability of success and х is the number of trials 
needed for the first success. Thus the TI-83 calculates probabilities for 
the random variable Y = X + 1. Entering 


geometpdf(.10, 2) 


from the DISTR menu will return the answer .09. 

Microsoft? EXCEL will calculate geometric probabilities as a 
special case of the negative binomial distribution. This will be covered 
in the next section. 


5.5 The Negative Binomial Distribution 


5.5.1 Relation to the Geometric Distribution 


The geometric random variable X represents the number of failures 
before the first success. In some cases, it may be useful to study the 
number of failures before the second success, or the third or the fourth. 
The negative binomial distribution gives probabilities for X, the 
number of failures before the nt? success. We will solve a problem of 
this type directly before giving the general probability formulas. 


Example 5.22 You are playing the slot machine on which the 
probability of a win on any individual trial is .05. You will play until you 
win twice. What 1s the probability that you will lose exactly 4 times 
before the second win? 

Solution There are a number of different sequences of wins and 
losses which will give exactly four losses before the second win. For 
example, if S stands for a success (win) and F stands for a failure (loss), 
two such sequences are SFFFFS and FSFFFS. Note that the 
probability of each of the above sequences can be obtained using the 
multiplication rule for independent events. 
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P(SFFFFS) = P(FSFFES) = (.95)(.05y? 


The probability of any sequence with exactly four losses before the 
second win will be the same value (.95)*(.05)?. However, there are 
clearly more such sequences than the two above. The number of such 
sequences can be counted using a simple idea. The last letter in the 
sequence must be an S. We really only need to count the number of 
ways to put a five letter sequence consisting of 1 S and four F's in front 
of the last 5. 


{5 letter sequence with one S} —— {final S} 
We can create a 5 letter sequence with one S by simply choosing the one 
place in the sequence where the single S appears. The number of ways 
this can be done 15 (1) — 5. Thus there are 5 sequences with exactly 4 


losses before the second win. Each sequence has a probability. of 
(.95)*(.05)*. The probability of exactly 4 losses before the second win is 


P(X = 4) = 5(.95)*(.05)? ~ .01018. o 


In the general negative binomial problem, the number of desired 
successes is denoted by r. (In the last example, r = 2 and a win was a 
success.) The random variable of interest is X, the number of failures 
before success r in a series of independent trials. As before X will 
assume the value k if there is a sequence of r successes (S) and k 
failures (F) with last letter S. (In the last example we looked at k = 4.) 
The probability of any such sequence will be q^p^. Each such sequence 
will have r + k entries, with S as a final entry. The form of a sequence 
is 


{т + k — 1 letters with exactly r — 1 copies of S} ——2 {final S}. 
The number of ways to choose the location of the r — 1 copies of 5 in 
the first r+k—1 letters is (^ T k " T (In the last example, 
r+k—1=S5andr—1=1.) The probability that X = k will be given 
by the product 


(Number of sequences)(Probabality o f an individual sequence). 
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Negative Binomial Distribution 


A series of independent trials has P(S) — p on each trial. 
Let X be the number of failures before success r. 


Poe (ES (Т ROLEI (5.9) 


Example 5.23 The telemarketer in Example 5.18 makes success- 
ful calls with probability р = .10. What is the probability of making 
exactly 5 unsuccessful calls before the third sale is made? 

Solution In this problem, r = 3 and k = 5. 


P(X = 5) = e ou P)cooycioy = (2 ).00059049 
= 21000059049) ~ .0124 


Rote memorization of the distribution formula is not recommended. An 
intuitive approach is more effective. In this problem, one should think of 
sequences of 8 letters (calls) ending in S with exactly 2 copies of S in 
the first 7 letters. Each sequence has probability (.90)°(.10)? and there 


are (3) = 21 such sequences. L1 


It is important to note one special case. When т = 1, X is the 
number of failures before the first success — a geometric random vari- 
able. This is intuitively obvious, and can also be verified in the distribu- 
tion formula. For r — 1 


P(x=h = (151 ato! = ($)ep- e». 


5.5.2 The Mean and Variance of the 
Negative Binomial Distribution 


The expressions given below will not be derived until a later chapter. 
However, we will give examples which should make these formulas 
intuitively reasonable. 
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Negative Binomial Distribution Mean and Variance 


Е(Х) = 1 (5.10а) 


V(X)= s (5.10b) 


Example 5.24 We return to Example 5.22 and the slot machine 
player who wishes to win twice. For this player, т = 2 and p= .05. 
Thus 


EX) = 2699) = 2.19 = 38 and V(X) = 


2(.95) 
‚05° 


These answers can be related to the geometric distribution. Recall that 
we have already calculated the mean and variance for the geometric 
distribution case (r = 1) in Example 5.17. The mean number of losses 
before the first win was 19. Now we see that the mean number of losses 
before the second win is 2 x 19. The player waits through 19 losses on 
the average for the first win. After the first win occurs, the player starts 
over and must wait through an average of 19 losses for the second time. 
Similarly, the variance of the number of losses for the first win was 380. 
For the second win it is 2 x 380. o 


= 2.380 = 760. 


This example illustrates that we can look at Х, the number of 
failures before the second success, as a sum of independent random 
variables. Let X, be the number of failures before the first success and 
X» the number of subsequent failures before the second success. Then 
X, and X; are independent random variables, and X = X, + X2. If we 
are waiting for the second success, we wait through Х| failures for the 
first success and then repeat the process as we go through X; subsequent 
failures before the second success, for a total of X = X, + X; failures. 
Note that although the separate waits X, and X» follow the same kind of 
geometric distribution, X; and X; can have different values. Thus 
X, + X3 is not the same as 2.X;. (A common student mistake is to 
confuse X, + X; and 2X;.) Sums of random variables will be studied 
further in Chapter 11. 
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Technology Note 


Microsoft® EXCEL has a NEGBINOMDIST function which 
calculates probabilities for this distribution. The table below was done in 
EXCEL. It shows the negative binomial probabilities for p = .10 and 
r= 1, 2 and 3. p(k) = Р(Х = k) is given fork = 0,1,..., 10. We have 
also included the cumulative probability F(k) = P(X < k). 


— “ү Binomial Distribution 


LO] T CONS T ONE T 
0.10000 0.10000 į 0.01000 0.01000 | 0.00100 0.00100 
0.09000 0.19000 | 0.01800 0.02800 | 0.00270 0.00370 
0.08100 0.27100 | 0.02430 0.05230 | 0.00486 0.00856 
0.07290 0.34390 | 0.02916 0.08146 | 0.00729 0.01585 
0.06561 0.40951 | 0.03281 0.11427 | 0.00984 0.02569 
0.05905 0.46856 | 0.03543 0.14969 | 0.01240 0.03809 
0.05314 0.52170 | 0.03720 0.18690 | 0.01488 0.05297 
0.04783 0.56953 | 0.03826 0.22516 | 0.01722 0.07019 
0.04305 0.61258 | 0.03874 0.26390 | 0.01937 0.08956 
0.03874 0.65132 | 0.03874 0.32064 | 0.02131 0.11087 
0.03487 0.68619 | 0.03835 0.34100 | 0.02301 0.13388 


н 
1 
2 
3 
4 
5 
6 
7 
8 
9 


_ 
© 


The value of p = .10 was used in our analysis of the telemarketer. 
The above table tells the telemarketer (or his manager) quite a bit about 
the risks of his job. There is a reasonable probability (.68619) that the 
first sale will be made with 10 or fewer unsuccessful calls. There is a 
low probability (.13388) that three sales will be made with 10 or fewer 
unsuccessful calls. 

This table was stopped at k = 10 only for reasons of space. The 
reader who constructs it for herself will find that it takes only a few 
additional seconds to extend the table to k = 78. This gives a fairly 
complete picture of the probabilities involved. 
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5.6 The Discrete Uniform Distribution 


One of our first probability examples dealt with the experiment of 
rolling a single fair die and observing the number X that came up. The 
sample space was S = {1, 2, 3, 4, 5, 6} and each of the outcomes was 
equally likely with probability 1/6. The random variable X is said to 
have a discrete uniform distribution on 1, ..., 6. This is a special case 
of the discrete uniform distribution on 1, ..., n. 


Discrete Uniform Distribution on 1, ...,7 


ра) = l,z—1,....n 


E(X) = 23 


Example 5.25 Let X be the number that appears when a single 
fair die is rolled. Then 


E(X) = $33 — 55 
and 


CA 


2 : 
V(X) = GF = 13 = 2916. D 


NI 


In Exercise 5-33 you will be asked to verify the results of 
Example 5.25 by direct calculation using the definitions of E(X) and 
V(X). The derivations of E(X) and V(X) using summation formulas 
are outlined in Exercise 5-35. 
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5-1. 


5-2. 


5-3. 


5-4. 


5-5. 


5-6. 


5-7. 
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Exercises 


The Binomial Distribution 


A student takes a 10 question true-false test. He has not attended 
class nor studied the material, and so he guesses on every 
question. What is the probability that he gets (a) exactly 5 
questions correct; (b) he gets 8 or more correct? 


A single fair die is rolled 10 times. What is the probability of 
getting (a) exactly 2 sixes; (b) at least 2 sixes? 


An insurance agent has 12 policyholders who are considered 
high risk. The probability that one of these clients will file a 
major claim in the next year is .023. What is the probability that 
exactly 3 of them will file major claims in the next year? 


A company produces light bulbs of which 2% are defective. 

(a) If 50 bulbs are selected for testing, what is the probability 
that exactly 2 are defective? 

(b) Ifa distributor gets a shipment of 1,000 bulbs, what are 
the mean and the variance of the number of defective 
bulbs? 


In the game of craps (dice table) the simplest bet is the pass line. 
The probability of winning such a bet is .493 and the payoff is 
even money, i.e., if you win you receive $1 more for each dollar 
that you bet. A gambler makes a series of 100 $10 bets on the 
pass line. What is his expected gain or loss at the end of this 
sequence of bets? 


In a large population 1096 of the people have type B- blood. At 
a blood donation center 20 people donate blood. What is the 
probability that (a) exactly 4 of these have B+ blood; (b) at most 
3 have B+ blood? 


In the population of Exercise 5-6, 50,000 pints of blood are 
donated. What is the expected number of pints of B+ blood? 
What is the variance of the number of pints of B+ blood? 
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5-8. 


5-9. 


5-12. 


5-13. 


5-14. 


An experiment consists of picking a card at random from a 
standard deck and replacing it. If this experiment is performed 
12 times, what is the probability that you get (a) exactly 2 aces; 
(b) exactly 3 hearts; (c) more than 1 heart? 


Suppose that 5% of the individuals in a large population have a 
certain disease. If 15 individuals are selected at random, what is 
the probability that no more than 3 have the disease? 


For a binomial random variable X with n — 2 and P(S) — p, 
show that (a) E(X} = 2p; (D) V(X) = 2р(1 — p). 


The Hypergeometric Distribution 


There are 10 cards lying face down on a table, and 2 of them are 
aces. If 5 of these cards are selected at random, what is the 
probability that 2 of them are aces? 


In a hospital ward there are 16 patients, 4 of whom have AIDS. 
A doctor is assigned to 6 of these patients at random. What is the 
probability that he gets 2 of the AIDS patients? 


A baseball team has 16 non-pitchers on its roster. Of these, 6 bat 

left-handed and 10 right-handed. The manager, having already 

selected the pitcher for the game, randomly selects 8 players for 

the remaining positions. 

(a What is the probability that he selects 4 left-handed batters 
and 4 right-handed batters? 

(b) What is the expected number of left-handed batters 
chosen? 


The United States Senate has 100 members. Suppose there are 

54 Republicans and 46 Democrats. 

(a) If a committee of 15 is selected at random, what is the 
expected number of Republicans on this committee? 

(b) What is the variance of the number of Republicans? 


5-18. 


5-20. 
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A bridge hand consists of 13 cards. If X is the random variable 
for the number of spades in a bridge hand, what are E( X) and 
V(X)? 


The Poisson Distribution 


An auto insurance company has determined that the average 
number of claims against the comprehensive coverage of a 
policy is 0.6 per year. What is the probability that a policyholder 
will file (a) 1 claim in a year; (b) more than 1 claim in a year? 


A city has an intersection where accidents have occurred at an 
average rate of 1.5 per year. What is the probability that in a 
year there will be (a) 0; (b) 1; (c) 2 accidents in a year? 


Policyholders of an insurance company file claims at an average 
rate of 0.38 per year. If the company pays $5,000 for each claim, 
what is the mean claim amount for a policyholder in a year? 


An insurance company has 5,000 policyholders who have had 

policies for at least 10 years. Over this period there have been a 

total of 12,200 claims on these policies. Assuming a Poisson 

distribution for these claims, answer each of the following. 

(a) What is A, the average number of claims per policy per 
year? 

(b) What is the probability that a policyholder will file less 
than 2 claims in a year? 

(c) Ifall claims are for $1,000, what is the mean claim amount 
for a policyholder in a year? 


Claims filed in a year by a policyholder of an insurance company 
have a Poisson distribution with А = .40. The number of claims 
filed by two different policyholders are independent events. 

(a) If two policyholders are selected at random, what is the 
probability that each of them will file one claim during the 
year? 

(b) What is the probability that at least one of them will file no 
claims? 
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5-21. 


5.4 


5-23. 


5-24. 


5-25. 


5-26. 


Show that a Poisson distribution with parameter À = k (an inte- 
ger) has two modes, & — 1 and k. 


Show that V(X) = А for a Poisson random variable X with 
parameter А. Hint: Show V(X) = E(X?) + E(-2A X + А?) 
and E(X?) = M +X. 


The Geometric Distribution 


If you roll a pair of fair dice, the probability of getting an 11 is 
1/18. (See Exercise 4-4.) If you roll the dice repeatedly, what is 
the probability that the first 11 occurs on the eighth roll? 


An experiment consists of drawing a card at random from a 
standard deck and replacing it. If this experiment is done 
repeatedly, what is the probability that (a) the first heart appears 
on the fifth draw; (b) the first ace appears on the tenth draw? 


For the experiment in Exercise 5-24, let X be the random varia- 
ble for the number of unsuccessful draws before the first ace is 
drawn. Find E(X) and V(X). 


At a medical clinic, patients are given X-rays to test for tubercu- 

losis. 

(a) If 15% of these patients have the disease, what is the 
probability that on a given day the first patient to have the 
disease will be the fifth one tested? 

(b) What is the probability that the first with the disease will 
be the tenth one tested? 


The Negative Binomial Distribution 


Consider the experiment of drawing from a deck of cards with 

replacement (Exercise 5-24). 

(a) What is the probability that the third heart appears on the 
tenth draw? 

(b) What is the mean number of non-hearts drawn before the 
fifth heart is drawn? 
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A single fair die is rolled repeatedly. 

(a) What is the probability that the fourth six appears on the 
twentieth roll? 

(b) What is the mean number of total rolls needed to get 4 
sixes? 


For the experiment in Exercise 5-28, let X be the random 
variable for the number of non-sixes rolled before the fifth six is 
rolled. What are E(X) апа V(X)? 


A telemarketer makes successful calls with probability .20. What 
is the probability that her fifth sale will be on her sixteenth call? 


If each sale made by the person in Exercise 5-30 is for $250, 
what is the mean number of total calls she will have to make to 
reach $2,000 in total sales? 


Consider the clinic in Exercise 5-26, where 15% of the patients 

have tuberculosis. 

(a) What is the probability that the fifteenth patient tested will 
be the third with tuberculosis? 

(b) What is the mean number of patients without tuberculosis 
tested before the sixth patient with tuberculosis is tested? 


The Discrete Uniform Distribution 


Verify the results of Example 5.25 by direct calculation using the 
definitions of E(X) and V(X). 


A contestent on a game show selects a ball from an urn containing 
25 balls numbered from 1 to 25. His prize is $1,000 times the 
number of the ball selected. If X is the random variable for the 
amount he wins, find the mean and standard deviation of X. 


Derive the formulas for E(X) and V(X) for the discrete uniform 


distribution. (Recall that 1-2--34 +n = 204D 
n(n + 1)2n + 1) ) 
6 . 


and 


2 +22 +32 +: + п? = 
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5.8 


5-36. 


5-37. 


5-38. 


5-39. 


Sample Actuarial Examination Problems 


A company prices its hurricane insurance using the following 

assumptions: 

(i) In any calendar year, there can be at most one hurricane. 

(11) In any calendar year, the probability of a hurricane is 0.05. 

(11) The number of hurricanes in any calendar year is indepen- 
dent of the number of hurricanes in any other calendar year. 


Using the company's assumptions, calculate the probability that 
there are fewer than 3 hurricanes in a 20-year period. 


A study is being conducted in which the health of two indepen- 
dent groups of ten policyholders 1s being monitored over a one- 
year period of time. Individual participants in the study drop out 
before the end of the study with probability 0.2 (independently 
of the other participants). 


What is the probability that at least 9 participants complete the 
study in one of the two groups, but not in both groups? 


A hospital receives 1/5 of its flu vaccine shipments from 
Company X and the remainder of its shipments from other 
companies. Each shipment contains a very large number of 
vaccine vials. 


For Company X's shipments, 10% of the vials are ineffective. 
For every other company, 2% of the vials are ineffective. The 
hospital tests 30 randomly selected vials from a shipment and 
finds that one vial is ineffective. 


What is the probability that this shipment came from Company 
X? 


An actuary has discovered that policyholders are three times as 
likely to file two claims as to file four claims. If the number of 
claims filed has a Poisson distribution, what is the variance of 
the number of claims filed? 
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A company buys a policy to insure its revenue in the event of 
major snowstorms that shut down business. The policy pays 
nothing for the first such snowstorm of the year and 10,000 for 
each one thereafter, until the end of the year. The number of 
major snowstorms per year that shut down business is assumed 
to have a Poisson distribution with mean 1.5. 


What is the expected amount paid to the company under this 
policy during a one-year period? 


In modeling the number of claims filed by an individual under 
an automobile policy during a three-year period, an actuary 
makes the simplifying assumption that for all integersn > 0, 
Pnt+i = i Pn, Where p, represents the probability that the policy- 
holder files n claims during the period. 


Under this assumption, what is the probability that a 
policyholder files more than one claim during the period? 


Chapter 6 
Applications for Discrete 
Random Variables 


6.1 Functions of Random Variables and Their 
Expectations 


6.1.1 The Function Y = aX +b 


We have already looked at functions of random variables. In Sections 
4.3 апа 4.4, we looked at the function f(X) = aX + b and used the 
identities 
E[f(X)] = E(aX +b) = а. E(X) - b 
and 
V[f(X)] = V(aX + 6) = à? У(Х). 


For example, we looked at a random variable X for the number of 
claims filed by an insurance policyholder in Example 4.6. 


Number of claims (z) pH 0 TT 12313] 
22 


3 
px) 72 05 | .01 


The expected value E(X) was .35 and the variance V(X) was .3875. In 
Examples 4.17 and 4.22, we looked at the total cost random variable 
f (X) = 1000X + 100. We then found 


E[f(X)] = E(1000X + 100) = 1000 EC X) + 100 = 450 
and 
V(fCX)] = V(1000.X + 100) = 1000? V (X) = 387,500. 
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Simple derivations of these results were sketched previously, but a 
closer look at the reasoning is needed. The reasoning used previously 
relied on the observation that Y = f(X) had a distribution table with the 
same underlying probabilities as X. 


Cost: f(x) = 10002 + 100 | 100 


[ — 3*9 [P 


For example, since the probability of 0 claims is .72, the probability of a 
total cost of f(0) — 1000(0) 4- 100 will also be .72. We could check the 
expected value above using this distribution table. 
E[f(X)] = .72(100) + .22(1100) + .05(2100) + .01(3100) 
= 450 = M (х): p(z) 
6.1.2 Analyzing Y = f(X) in General 


The identity 


EIXO] = У (т): р(х) 


holds for any discrete random variable X and function f(x). However, 
there is a subtle point here. This point is illustrated in the next example. 


Example 6.1 Let the random variable X have the distribution below. 


Баа Еа aC ea 
Le) | 20 [6 [20 


If f(x) = z?, the naive table extension technique just used in Section 
6.1.1 gives us a similar distribution. 


Calculating the mean for X? gives 


E(X?) = V ^? - p(x) = 20(1) + .60(0) + 20(1) = .40. 
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The subtle point is that the previous table is not exactly the probability 
distribution table for X?, since the value of 1 is repeated twice in the top 
row. The true distribution table for Y = X? is the following: 


v=i@=" 91. 
w) [ 6 [40 


Using this table, we still get the same result. 
EY) = Vy ply) = .60(0) + 40(1) = .40. n 


This example illustrates two major points: 


(1) The distribution table for X can be converted into a prelim- 
inary table for f(X) with entries for f(x) and p(x), but some 
grouping and combination may be necessary to get the actual 
distribution table for Y — f(X). 

(2) Even though the tables are not the same, they lead to the 
same result for the expected value of Y = f(z). 


EY) = M y: ply) = EIX) = Уа) pa) 


The final summation above is the expression in Equation (6.1). It 
is usually the simplest one to use to find E[f(.X)]. The general proof of 
Equation (6.1) follows the reasoning of the previous example, but will 
not be given here. 


6.1.3. Applications 


In this section we will give an elementary example from economics: the 
expected utility of wealth. 


Example 6.2 For most (but not all of us), the satisfaction obtained 
from an extra dollar depends on how much wealth we have already. A 
single dollar may be much less important to someone who has $500,000 
in the bank than it is to someone who has nothing saved. Economists 
describe this by using utility functions that measure the importance of 
various levels of wealth to an individual. One utility function which fits 
the attitude described above is u(w) — vw, for wealth w > 0. The 
graph of u(w) is given in the following figure. 
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We can see from this graph that utility increases more rapidly at first and 
then more slowly at higher levels of wealth, w. We will now look at how 
a person with the utility function u(w) = Jw might make financial 
decisions. (The reader should be aware that this is only one possible 
utility function. Other individuals may have very different utility func- 
tions which lead to very different financial decisions.) 

Suppose a person with the utility function u(w) = fw can choose 
between two different methods for managing his wealth. Using Method 
1, he has a 10% chance of ending up with w = 0 and a 90% chance of 
ending up with w = 10,000. Using Method 2, he has a 2% chance of 
ending up with w = 0 and a 98% chance of ending up with w = 9,025. 
(Which would you choose?) These two methods of managing wealth are 
really two random variables, W; апа W3. 


Random variable W, for Method 1 


Wealth (и) | 0 | 10000 | 
[ mw |] 9 | 


Random variable W; for Method 2 
Wealth (w) | Ww 9,025 
91 | 98 | 


One way to evaluate these two alternatives would be to compare their 
expected values. 

E(W,) = .10(0) + .90(10,000) = 9,000 

Е(Иљ) = .02(0) + .98(9,025) = 8,844.50 
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This comparison implies that Method 1 should be chosen, since it has 
the higher expected value. However, this method does not take into 
account the utility that is attached to various levels of wealth. The 
expected utility method compares the two methods by calculating u(w) 
for each outcome and comparing the two expected utilities E[u(W,)] 
and E[u(W;)]. We can expand the two tables for wealth outcomes to 
include u(w) = fw for this calculation. 


Method 1 


Wealth (w) 


[ 0 [ 19909 | 
Cw) ун [ 9 | viso | 
_ rw) [|10] 99 | 


Me 
[ Weatth (wy | 0 [ 9925 | 
Puy = Vw | o | 9.035 | 
[ xw |] 9 | 


We can now compute expected utility. 


E[u(W;)] = .10(0) + .904/10,000 = 90 
E[u(W2)] = .02(0) + .98\/9,025 = 93.10 


Using expected utility, the person with u(w) = Jw would choose 
Method 2 instead of Method 1. L1 


Expected utility is analyzed much more deeply in other texts. The 
important point here is that this economic decision-making method 
makes use of the identity 


E[u(W)] = У u(w) : p(w), 
which was discussed in this section. 


6.1.4 Another Way to Calculate the Variance 
of a Random Variable 


In Section 4.4.1 we defined the variance of a random variable X by 


V(X) = BUX — uy] = у (= uy - la). 
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In that definition, we were implicitly using Equation (6.1) with 
f(x) = Уа — uy. There is another way to write the variance. If we 
expand the expression (х — и)”, we obtain 


V(X) = У 00? — 2uz + p°): pa) 


=) r pe) 2u) [a p(x) +?) pa) 
= E(X?) - 2и. E(X) +p? +1 

= Е(Х2) - 2u: p - i: 1 

= E(X?)- y). 


Thus we can write 


V(X) = E(X?) - p? = EX?) – (E(X. (6.2) 


Example 6.3 We will verify the variance calculated for the claim 
number distribution from Example 4.6. 


"Number ofciaims (5) [ 9 [ 1 [213] 
PG) 75 


We know that E(X) = .35. Using Equation (6.1), 


E(X?) = .72(02) + .22(17) + .05(22) + .01(32) = .51. 
Then Equation (6.2) gives 
V(X) = E(X?) ~ (E(X) = 51 — 35? = 3875. 


This verifies our previous calculation obtained directly from the defini- 
tion. L1 
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It is important to know Equation (6.2). It is widely used in proba- 
bility and statistics texts. These texts often note that the calculation of 
V(X) can be done more easily using Equation (6.2) than from the 
definition. This is true for computations done by hand, but computations 
are rarely done by hand in our computer age. In fact, examples have been 
developed to show that Equation (6.2) has a disadvantage for computer 
work when large values of X are present; there are problems with 
overflow due to the magnitude of X?. This is pursued in Exercise 6-4. 


6.2 Moments and the Moment Generating Function 
6.2. Moments of a Random Variable 


We saw in Section 6.1.4 that E(X?) could be used in the calculation of 
V(X). E(X?) is called the second moment of the random variable X. 
There are useful applications of expected values of higher powers of X 
as well. 


Definition 6.1 The nt? moment of X is E(X”). 
Note that the first moment is simply E(X). 


Example 6.4 The third moment of the claim number random 
variable in Example 6.3 is 


E(X?) = .72(03) + .22(13) + .05(23) + .01(33) = .89. ш) 
6.2.2 The Moment Generating Function 


The definition of the moment generating function does not have an 
immediate intuitive interpretation. In this section, we will define the 
moment generating function and show how it is applied. In Section 6.2.9 
we will give an infinite series interpretation which may help the reader 
to understand the motivation behind the definition. 


Definition 6.2 Let X be a discrete random variable. The moment 
generating function, denoted M x (t), is defined by 


Mx(t) = Е(е) = Уе. р(х). 
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Example 6.5 Below is the probability function table for the claim 
number random variable X. We have added a row for е! so that Mx (t) 
can be calculated . 


е = | 


elt | e% 
p(z) 1127 |.22|.05 


Then 
Mx(t) = .72(1) + .22(e') + .05(e%) + .01(e**). 


M x(t) is called the moment generating function because its derivatives 
can be used to find the moments of X. For the function above the 
derivative is 

My (t) = 0 + .22(e') + 052) (€?) + .01(3)(e**). 
If we evaluate the derivative at t = 0, we obtain 


МК(0) = 0 + 22(1) + .05(2) + .01(3) = 35 = E(X). 


This is the first moment of X. The higher derivatives can be used in the 
same way. 


ME) = 0 + 22(e) + .05(22)(e?*) + .01(32)(e**) 
М (0) = 0 + .22(12) + .05(22) + .01(32) = 51 = E(X) О 
This result holds in general. 
Mx(t) = X `e" р(х) 
МФ = Xr- e" - p(z) and My(0) = Sx: p(z) = EX) 
Му) = Sox? - e" - p(x) and Мх(0) = Sox? - p(x) = E(X?) 


The general form is the following: 
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Many standard probability distributions have moment generating 
functions which can be found fairly easily. In the next sections, we will 
give the moment generating functions for all of the random variables in 
this chapter except the hypergeometric. This will give us a way of 
deriving the mean and variance formulas stated in the previous chapter. 


6.2.3 Moment Generating Function for the Binomial Random 
Variable 


We begin with the binomial random variable with n = 1 and P(S) = p. 
The distribution table needed for the moment generating function is the 
following: 


Then 
Mx(t) = E(e'*) = q+ pet. 


For n = 2, the table and moment generating function are as 
follows: 


Mx(t) = 4 + 2рде + pe” = q? + 2q(pe!) + (pe? = (а + ре!) 


The pattern should be clear. 


Binomial Distribution Moment Generating Function 
(n trials, P(S) = p) 


Mx(t) = (а + ре!)" (6.4) 


The general proof is similar to the proof for n = 2, and is outlined 
in Exercise 6-5. Once the moment generating function is derived, the 
mean and variance of the binomial distribution can be easily found. 
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M(t) = n(q + ре!)" ! pe! 
M*(0) = n(p + q^! p = np = E(X) 
M(t) = п[(9+ре!)" ре + (n—1)(q+pe')” *(pe')’] 
M$(0) = nip + (n- Dp?] = np + (np? — np? = E(X?) 
V(X) = E(X’) – (Е(Х)) = (пр+(пр)?%—пр?) — (пр)? 
= np(1 — р) 


6.2.4 Moment Generating Function f 
for the Poisson Random Variable 


Poisson Distribution Moment Generating Function 
(Rate A) 


Mx(t) = eX" 


Бех) = Y p) e = Y (see 
k=0 


k=0 


= e ^g = eMe - 0 


We have already shown that E(.X) — A. Exercise 6-6 asks the reader to 
use the moment generating function to verify that E(X) = V(X) = A. 


6.2.5 Moment Generating Function 
for the Geometric Random Variable 


Geometric Distribution Moment Generating Function 


(P(S) — p) 


Mx) = ia (6.6) 
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The derivation of this result relies on the sum of an infinite geometric 
series. 


Elex) = Уу p(k) e" = У (рф) = рУ "(qe = р: 13 
k=0 k=0 k=0 TE 


We have already shown that E(X) = q/p. Exercise 6-7 asks the reader to 
use the moment generating function to find the mean and variance for X. 


6.2.6 Moment Generating Function for the Negative Binomial 
Random Variable 


Negative Binomial Distribution Moment Generating Function 
(P(S) = p; X = number of failures before success r) 


(6.7) 


Mx(t) = ( 


r= 


Note that the moment generating function for the geometric random 
variable, given by Equation (6.6), is just Equation (6.7) with r = 1. We 
will not give a derivation of this result at this time. In Chapter 11 we will 
develop machinery which will make it easier to establish this result by 
looking at the negative binomial random variable as a sum of indepen- 
dent geometric random variables. 


6.2.7 Other Uses of the Moment Generating Function 


Moment generating functions are unique. This means that if a random 
variable X has the moment generating function of a known random 
variable, it must be that kind of random variable. 


Example 6.6 You are working with a random variable X, and find 
that its moment generating function is 


Mx(t) = (24 .8e!)’. 
This is the moment generating function for a binomial random variable 


with p = .80 and n = 7. Thus X is a binomial random variable with 
p= .80 andn = 7. L1 
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The technique of recognizing a random variable by its moment 
generating function is common. Thus it will be very useful to be able to 
recognize the moment generating functions given in this section. 


6.2.8 А Useful Identity 


If Y = aX + b,.the moment generating function of Y is as follows: 


Max+o(t) = е. Mx(at) 


Example 6.7 Suppose X is Poisson with A = 2. Let Y = 3X +5. 


Then 
Mx(t) = е0 
апа 
My (t) = е. Mx(3t) = ee"), 
A proof of this identity is outlined in Exercise 6-11. o 


6.2.9 Infinite Series and the Moment Generating Function 


We can understand why MPO = E(X”) if we look at an infinite series 
representation of e". 
The series expansion for е" about z = 0 is 


2 3 
z zd e 
е = 1+2 + e+ at. 


If we substitute the random variable tX for z in this series, we obtain 


3 
Saar t EX Fg es 


If we take the expected value of each side of the last equation (assuming 
that the expected value of the infinite sum is the sum of the expected 
values of the terms on the right-hand side), we obtain 
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2 3 
Mx(t) = E(e'*) = 1 +t: E(X) + $y ЕХ?) + E ЕХЭ) H 


Now we can look at the derivatives of Mx(t) by differentiating the 
series for M x (2). For example, 


Mx(t) = ZAMx()] 
= Е(Х) +. E(X?) + i CEQU) E s 
It is clear from this series representation that М (0) = E(X). Similarly, 
MZ) = C (Mt) 
= E(X?) + tE(QC) + P E(X’) qnos. 


and we see that M{(0) = E(X?). 


6.3 Distribution Shapes 


We can visualize the probability pattern in a distribution by plotting the 
probability values in a bar graph or histogram. For example, the 
geometric distribution with p — .60 has the following probability values 
(rounded to three places): 


0 
1 
2 
3 
4 
5 
6 
7 
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The histogram is shown in the following figure. 


Geometric: p = .60 


The binomial distribution with n = 20 and p = .15 has the histo- 
gram below. (Values of z > 11 are omitted because p(z) is very small.) 


Binomial: л = 20, p = .15 
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The Poisson distribution with À = 3 has a very similar histogram. 


Poisson, rate = 3 


In many applied problems, researchers look at histograms of the 
data in their application to try to detect the underlying distribution. 
These histograms also provide a useful hint as to the method for 
analyzing continuous distributions. Suppose we look at the binomial dis- 
tribution for n = 10 and p = .60. 


Binomial: л = 10, p = .60 


The area of the marked bar in this histogram represents the probability 
that X = 9. The pattern of this distribution might be represented by a 
continuous curve fitted through the tops of these rectangles. 
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Binomial: Continuous Approximation 


0.300 
0.250 
_ 9.200 
X 0.150 


S 0.100 
0.050 
0.000 


This curve describes the pattern very well, and the area under the curve 
between 8.5 and 9.5 is a good approximation of the area of the marked 
bar in the histogram area which represents Р(Х = 9). This approxima- 
tion 1s helpful in understanding the probability methods for continuous 
distributions in the next chapter. These methods are based on calculating 
probability as an area under a curve between two points. 


6.4 Simulation of Discrete Distributions 
6.4.1 A Coin-Tossing Example 


Suppose you plan to toss a coin ten times and bet that it will show a head 
on each toss. The theoretical probabilities of each possible number of 
heads are completely known. They follow a binomial distribution with 
n — 10 and p — .50. We can calculate these probabilities easily. They 
are given in the following table: 
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0 ‚000977 
1 ‚009766 
2 ‚043945 
3 ‚117188 
4 .205078 
5 .246094 
6 .205078 
7 .117188 
8 .043945 
9 .009766 
10 .000977 


However, knowing these probabilities does not enable you to experience 
what happens when you actually toss the coin ten times. You could do 
this simple experiment by actually tossing a coin ten times, but you 
could do it more rapidly and simply using a computer simulation. To 
simulate a single toss, have the computer generate a random number 
from the interval [0, 1). If the number is less than .50, call the toss a 
head. If the number is greater than or equal to .50, call the toss a tail. To 
simulate ten tosses, have the computer generate ten random numbers for 
the same procedure. We did this in EXCEL. The results of one series of 
ten “tosses” are given below. 


Random Number Outcome Random Number Outcome 
0.32957 H 0.86690 T 
0.96496 T 0.03550 H 
0.10965 H 0.84940 T 
0.10876 H 0.20878 H 
0.38750 H 0.64528 T 


Since the number used is chosen at random from [0, 1), the probability 
that the number is in the interval [0,.50) for heads is .50 and the 
probability that the number is in the interval [.50, 1) for tails is .50. Thus 
P(H) = .50 and Р(Т) = .50, as is desired for a fair coin. 

The simulation in this example merely allows us to play a game 
whose probabilities we already understand. Simulation is also used to 
study complicated probability problems which cannot be solved easily in 
closed form. We will not look at problems of that level of difficulty until 
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Chapter 12. In this section we will discuss how to simulate the discrete 
random variables studied in this chapter. 


6.4.2 Generating Random Numbers from [0, 1) 


The intuitive procedure used in the last section relied on the ability to 
pick a number at random from the interval [0, 1). This random pick must 
give all numbers in the interval an equal probability of being chosen, so 
that the probability of a number in the interval [0, .50) is .50. In practice, 
most people simply use the random number generator on their computers 
or calculators to find random numbers. In this section we will illustrate 
the kind of method that might be used to build a random number 
generator for a computer program. In later sections of this text, we will 
use computers to generate random numbers without showing the 
background calculations. 

A basic method for generating a sequence of random numbers is 
the linear congruential method. When using this method, you must 
start by selecting four non-negative integers, a, b, m and zı. The number 
x, must be less than m, and is your first number in the random sequence. 
It is called the seed. To generate the second number in the sequence z2, 
calculate y = az; + b, divide it by m, and find the remainder. This 
process can be repeated to find more numbers in the sequence. In 
practice, the values used for a, b and m are quite large, but we will 
illustrate the procedure for the simpler case where a — 5, b — 7, m — 16 
and xz; = 5. 


Step 1: у = ат +b = 5(5) +7 = 32 

Remainder when 32 is divided by 16: x2 = 0 
Step 2: у = атэ +b = 5(0) +7 = 7 

Remainder when 7 is divided by 16: z3 = 7 


The successive numbers in the sequence аге all between 0 апа 15. 
We can generate numbers in [0, 1) by dividing by 16. 


= 0 _ le 
= .3125 те = 0 16 ^ 4375 


Applications for Discrete Random Variables 167 


The results of repeating this procedure 16 times are given in the next 
table. 


ev 


5 


1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 


In the preceding example the numbers ту. were remainders after dividing 
by 16, so there are only 16 possible values for z+. In fact, if we use the 
last number in the table (21 = 6) to find 217, we will find that z;; = 5 
which was our starting point. The sequence will repeat itself after 
m = 16 entries. 

The random number generators used in computers are based on 
much larger values of a, b, and m. For example, Klugman et al. [8] 
discuss using a = 742,938,285, b = 0 and m = 23! — 1. These numbers 
provide reasonable random number generators for practical use, and 
researchers have discovered other values of a, 6 and m which also 
appear to work well. However, the example above with m — 16 
illustrates an important point. Any linear congruential generator will 
eventually enter a deterministic repeating pattern. Thus it is not truly 
random. For this reason, these useful generators are called pseudo- 
random. 

In the remainder of this text, we will not require linear congruen- 
tial generator calculations for random numbers. Computers can do these 
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calculations for us. We will simply use computer generated random 
numbers in the interval [0, 1). 


Technology Note 


The TI-83 will generate a random number from [0,1) using the 
command “RAND” in the MATH menu under PRB. EXCEL has a 
RAND() function which will give a random number in [0, 1). MINITAB 
will generate numbers from [0, 1) using the menu choices Calc, Random 
Data, and Uniform. 


6.4.3 Simulating Any Finite Discrete Distribution 


We can use random numbers from [0, 1) to simulate any finite discrete 
distribution by using an extension of the coin toss simulation reasoning. 
This is best shown by an example. Suppose we are looking at the random 
variable with the following probability function. 


ee Oe 2-7] 


Given a random number z from [0, 1), we assign the outcome 0, 1 
or 2 using the rule 


0 if0<a2< .25 
outcome = 4 1 if25 < x < .75. 
2 if.75<a<l1 


We did this in an EXCEL spreadsheet. The results of 10 trials are shown 
in the next table. 
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.109371 
.449958 
.253222 
.108458 
.377789 
.481501 
.027924 
452472 
‚936474 
‚318389 


© со з с л UC м 


— 
© 


The frequencies of the individual outcomes in the preceding table are 
shown in the next table. 


Note that with only ten trials, you should not expect to see the 
outcomes occur with exactly the same percentages as given in the 
original distribution. Even with 100 trials, the percentages of the out- 
comes do not always match the original distribution very well. The next 
table gives the results of a simulation of 100 trials for this distribution. 


A simulation of 1000 trials gives results closer to the original distribu- 
tion. The results of a single simulation of 1000 trials are given in the 
next table. 
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0 245 
1 514 
2 241 


6.4.4 Simulating a Binomial Distribution 


The reader may have noticed that the finite discrete distribution 
simulated in the last section was a binomial distribution with n = 2 and 
р = .50. The method was easy to implement for that binomial due to the 
small number of outcomes, but programming may become tedious if n is 
large. There is another way to simulate any binomial by having the 
computer simulate n trials and total the number of successes. For 
example, if you wish to simulate the binomial with n = 10 and p = .30, 
generate 10 random numbers x. If x < .30 on a trial, a success has 
occurred. Otherwise, the trial was a failure. The computer can be used to 
add up the number of successes to obtain the binomial outcome. In the 
next table we show the result of one simulation for n = 10 and p = .30. 


Random Number Random Number 
53917995 414125 
.49763993 335325 


.53307458 438872 
5367283 377748 
41993715 .076637 


This ten-trial experiment led to nine failures and one success. 
6.4.5 Simulating a Geometric Distribution 


The geometric random variable X represents the number of failures 
before the first success in a series of binomial experiment trials. To 
simulate it, have the computer generate random numbers for a success- 
failure experiment until the first success is obtained and then count the 
number of prior failures. The table in Section 6.4.4 demonstrates how 
this might be done for p — .30. In that table, the first success was 
obtained on trial 10, so that the geometric random variable X assumes 
the value 9. 
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6.4.6 Simulating a Negative Binomial Distribution 


The negative binomial random variable measures the number of failures 
before the r“ success. This can be simulated in the same manner as the 
geometric distribution. 


6.4.7 Simulating Other Distributions 


Simulations are widely used, and a number of ingenious methods have 
been developed for them. Many of those methods are beyond the scope 
of this course, but the designers of computer programs have implemen- 
ted them so that they are available to the ordinary user. In this section we 
have tried to give a basic idea of how simulations may be done, not to 
show the reader how to implement every possible kind of simulation. In 
practice, most people simply use computer routines which simulate the 
most widely-used distributions directly (without the intermediate step of 
starting with random numbers from [0, 1)). The spreadsheet Microsoft® 
EXCEL and the statistical program MINITAB both will simulate the 
binomial and Poisson distributions directly. In addition, each program 
will allow the user to input any finite discrete distribution for simulation. 


6.5 Exercises 
6.1 Functions of Random Variables and Their Expectations 


6-1. Ша year, a policyholder with an insurance company has no 
claims with probability .69, 1 claim with probability .23, 2 
claims with probability .07, and 3 claims with probability .01. If 
X is the random variable for the number of claims, find 
(a) Е(500Х + 50); (b) E(X?); (c) ECX?). 


6-2. | Let X be the random variable for the sum obtained by rolling a 
pair of fair dice (see Exercise 4-4). Find V(X) by using the 
alternate formula V(X) = E(X?) — Е(Х)?. 


6-3. Rework Example 6.2 using the logorithmic utility function 
u(w) = In(w + 1). What are E[u(W;)] and E[u(W3)] for this 
utility function? 
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6-6. 


6-7. 


6-9. 


6-10. 
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Overflow problems occur when you exceed the precision of the 
computer or calculator you are using. Consider the distribution 
whose values of т are 1,000,000,000.1, 1,000,000,000 and 
999,999,999.9, each with probability 1/3. The variance for this 
distribution is .00666. If you try to compute the variance using 
Equation (6.2), the value you get will depend on the precision of 
your computer or calculator and may not be correct. Use your 
calculator to find E(X?) and E(X). Then use Equation (6.2) and 
determine whether or not you found the correct value of V(X). 


Moments and the Moment Generating Function 


Show that the moment generating function for the binomial 
distribution is (q + pe)”. Hint: Expand (q + p)" using the bino- 
mial theorem and use it to get the moment generating function. 


Use the moment generating function for the Poisson distribution 
to verify that E(X) = У(Х) = А. 


Use the moment generating function for the geometric distribu- 
tion to obtain its mean and variance. 


Use the moment generating function for the negative binomial 
distribution to obtain its mean and variance. 

Let X be a discrete random variable with p(x) = 1 for 
x=1,...,n.(X is a discrete uniform random variable.) 

(a) Show that the moment generating function for X is 


(b) Find E(X) and V(X). 


Let X be a random variable whose probability function is given 
below. 


Find M x (1) and use its derivatives to find E(X) and E(X?). 
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6-11. 


6-12. 


6-13. 


6.4 


6-14. 


6-15. 


6-16. 


Prove Мьх+ь@%) = е · Mx(at). 


If X is a binomial random variable with p = .60 and n = 8, and 
if Y = 3X + 4, what is My(t)? 


If Mx(t) = [.70/(1 — .3е')]5, what is the distribution of X. 


Simulation of Discrete Distributions 


Using the linear congruence y = 9x + 11 (mod 16), with seed 
x, = 6, find T2, 73, ..., 216. 


For Exercises 6-15 and 6-16, use the following sequence of 
random numbers from [0, 1). 


1..5619 6. .9983 11. .7855 16. .3729 
2. .4500 7. 0225 12. .9955 17. 1326 
3. .3566 8. .8026 13. .6558 18. .9246 
4. .5844 9. .3516 14. .1280 19. .6867 
5. .8638 10. .4584 15. .3908 20. .9638 


Random numbers from [0,1) are used to simulate a binomial 
distribution with n = 20 and p = .40. If the random number х is 
less than .40 on a trial, then a success has occurred. Count the 
number of successes in the 20 trials. 


Random numbers from [0, 1) are used to simulate repeated trials 
of the experiment of tossing 5 fair coins. The first five numbers 
represent the first trial, the second five numbers the second, and 
so on. If the random number z is less than .50, the coin is a head. 
How many heads appear on each of the first four repetitions of 
this experiment? 
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6-17. 


6-18. 
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Sample Actuarial Examination Problems 


A baseball team has scheduled its opening game for April 1. If it 
rains on April 1, the game is postponed and will be played on the 
next day that it does not rain. The team purchases insurance 
against rain. The policy will pay 1000 for each day, up to 2 days, 
that the opening game is postponed. 


The insurance company determines that the number of con- 
secutive days of rain beginning on April | is a Poisson random 
variable with mean 0.6. 


What is the standard deviation of the amount the insurance 


company will have to pay? 


Let X1, X2, Хз be a random sample from a discrete distribution 
with probability function 


i fo r—0 
р(х) = {2 fo r=1 
0 otherwise 


Determine the moment generating function, M (t), of 
Y = X,X2X3. 


Chapter 7 
Continuous Random Variables 


7.1 Defining a Continuous Random Variable 
7.1.1 A Basic Example 


Suppose you are asked to pick a number at random from the interval 
[0, 1] with all numbers in the interval being equally likely.! The number 
X that you pick is a random variable, since it is a numerical quantity 
whose value depends on chance. However, X is not discrete. The 
interval [0, 1] is continuous, and you can pick any number from it. X is 
therefore continuous. 

Probabilities for continuous random variables will be calculated in 
a new way. The discrete methods used in the previous chapters will not 
apply. The continuous probability method is nicely illustrated by looking 
at the random variable X above. For example, suppose that you wished 
to calculate the probability P(.50 < X < .75). Intuitively, it is natural to 
guess that this probability is .25, since 2596 of the numbers in the 
interval [0,1] are between .50 and .75. The probability calculation 
method for continuous random variables should give this natural answer. 

The method that is used involves the standard calculus problem of 
finding areas under curves. In Section 6.3 we noted that probabilities 
(represented by histogram areas) for a discrete random variable could be 
approximated by areas under a suitable curve. For this random variable, 


1 The random number generator introduced in Chapter 6 would pick a rational number 
from [0, 1), so that 1 was not a possible value. In this example, we pick a real number 
from (0, 1], and ! is possible. 
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we will find probabilities exactly by looking at areas under the curve 
y = f(x) defined by 

|. ]1 0<х<1 
f(a) = E otherwise ` 


This function f(z) is called the density function for X. We will 
calculate the probability P(.50 < X < .75) by finding the area bounded 
by f(x) and the z-axis between x = .50 and x = .75. This is pictured in 
the next figure. 


Density Function 


The desired area is .25, which is the intuitively natural answer for 
P(.50 € X < .75). 

To find the general probability P(a € X < b), we find the area 
bounded by the graph of f(x) and the z-axis between = = a and z = b. 
This is the area of a rectangle, but we could calculate it by integration. 


b 
Ра < Х <5)= | fads 


For example, 
32 


P(10 < X < .32) = ] ldz = 22. 


.10 


This also is the intuitively natural answer, since 22% of the interval is 
between .10 and .32. 
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It is important to note that the total area bounded by f(x) and the 
z-axis is 1.00. This tells us that P(0 € X < 1) = 1, which is certainly 
true if we are picking a number in the interval [0, 1]. 


7.1.2 The Density Function and Probabilities 
for Continuous Random Variables 


Probabilities for any continuous random variable are computed in a 
similar fashion, using a density function and areas under the density 
function curve. The density function used will depend on the random 
variable. The following definition of a density function is based on 
properties which were illustrated in the example in Section 7.1.1. 


Definition 7.1 The probability density function of a random 
variable X is a real-valued function satisfying the following properties: 


(a f(x) > 0 forall z. 
(b) The total area bounded by the graph of y = f(x) and the z- 
axis 1s 1.00. 


f 192-1 (7.1) 


(c) P(a< X < b) is given by the area under у = f(x) between 
r = аапі х = b. 


b 
Р(а<Х <) = / f(z) dz (7.2) 


Example 7.1 А risky investment has widely varying possible 
return percentages for the next year. The best that can happen for this 
particular investment is a return of 10095. (The investor doubles her 
money by getting back the amount invested plus 100% of the amount 
invested.) The worst that can happen is a return of —100%. (The 
investor loses 10096 of the amount she invests.) The percentage return 1s 
a random variable X which could be anything from —1 (—100%) to 1 
(10096), depending on the state of the economy in one year. The 
probability density function is 


 [350-2) -l<2<l 
f(z) = n otherwise ` 


Find the probability that the return is greater than 10%. 
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Solution Since we are told that f(x) is a density function, we 
know that f(x) > 0 and the total area under the curve is 1.00. It is still a 
good idea for the reader to check these key properties. The graph of f(x) 
is given in the next figure. 


Investment Density Function 


0.8 


The graph shows that f(z) is non-negative. The total area under the 


curve 15 
1 "uL 
/ f(x) dx = 75| x – ®- E 
zx 3 24 


The probability that Х is greater that 10% is 


1 
f f(z) dx = 75 (« — 5) 
10 


The probability density function in this example makes intuitive 
sense for a risky investment. The investor can make a lot or lose a lot. In 
fact, the probability that X is less than —10% is also .42525. The shape 
of the curve shows that the greatest gains and losses have somewhat 
lower probabilities. 


1 


= .42525. О 
10 
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7.1.3 Building a Straight-Line Density Function 
for an Insurance Loss 


In this section we will look at an example in which we derive the density 
function for a random variable based on simple assumptions about its 
behavior. 


Example 7.2 You are going to offer a warranty insurance policy 
which pays for repairs on a new appliance in the next year. Your 
experience indicates that repair costs X on a single policy will be in the 
interval [0, 1000]. Probability will be highest for the lowest costs (those 
near 0), and will fall off in a straight line fashion until x reaches 1000. 
Find an appropriate density function, and calculate Р(Х > 600). 

Solution The density function will be a straight line segment of 
negative slope, starting at z = 0 and ending at x = 1000. It is pictured in 
the graph below. 


Loss Severity Density Function 


0.0025 4 


0.0020 


0.0015 4 


0.0010 4 


0.0005 4 


0.0000 + T 
600 800 1000 1200 1400 
Loss Amount 
x 


The straight line and the two axes bound a triangle with base 1000. To 
make the total area under the curve equal 1.00, we need a height of .002. 
Thus f(0) = .002 and f(1000) = 0. Once these values are specified, we 
can find the equation of the straight line. 


a .002—.000002z 0 € x < 1000 
fs 0 otherwise 
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The probability P(X > 600) is the area of the triangle to the right of 
x = 600 and below the line segment. Thus 


400 - f(600) 
EC NE 


P(X » 600) — = 200(.0008) = .16. 


For straight-line densities, it is usually easier to find probabilities as 
areas of trapezoids or triangles. The reader can check that integration 
would give the same answer. 


1000 
f (.002 — .000002z) dx = .16 Г] 
600 


7.1.4 The Cumulative Distribution Function F(x) 


In Chapter 4 we defined the cumulative distribution function F(x) by 
F(z) = Р(Х < х). 


The definition of F(x) is ће same for discrete and continuous random 
variables, but the calculations for continuous random variables use 
integration rather than discrete summation. 


F(z) = l f(u)du (7.3) 


Example 7.3 We return to the loss severity distribution in 
Example 7.2. For = in the interval (0, 1000], F(x) is the area under the 
density curve from 0 to z. 


Loss Severity Density Function 


T 
800 1000 1200 1400 


Loss Amount 
x 
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We can calculate this area as the area of a trapezoid or by integration. 

F(z) = n (.002 — .000002u) du = .002z — .0000012?,0 < x < 1000 
0 


Note that F(x) = 0 for z € 0 and F(x) = 1 for x > 1000. The graph of 
F(x) is shown below. 


Loss Severity Cumulative Distribution Function 


T x T 


800 1000 1200 


и 


Since F(x) is defined by integrating f(z), it is clear that the 
derivative of F(x) is f(z). This simple relationship is very important 
when the derivative F’(x) exists. 


Ех) = f(x) (7.4) 


7.1.5 A Piecewise Density Function 


The density function for a continuous random variable can be defined 
piecewise and fail to be continuous at some points, as the following 
example shows. 


Example 7.4 A company has made a loan which has a variable 
interest rate. One month from now interest will be due, but the rate is not 
known now. It will be set then, based on the value of a short-term 
borrowing rate which changes daily. The company believes that the den- 
sity function given below is a reasonable one for this future interest rate. 
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0 r«0 

fux 5602 0 < х < .05 
152+ 3.75 .05 < х < .25 
0 q 25 


The graph of f(z) for 0 < x < .25 is shown below. Note that f(z) is not 
continuous at z — .05. 


Interest Rate Density Function 


The company is projecting higher probabilities for rates below 5%, but 
is allowing the possibility of rates above 596. The total area under this 
density function breaks into two triangular pieces whose areas can be 
easily calculated. 


05 
Р(0 < X < .05) = J 5602 dx = .70 
0 


25 
Р(.05 < Х < 25) = | (-15z + 375) dz = 30 
.05 


The tota] area is 1.00. Other probabilities may also involve two calcula- 
tions similar to the above. For example, 


05 07 
P(.03 < X < .07) = / 5602 dx + (—15z + 3.75) dx 
03 OS 


= .448 + .057 = .505. 
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It is important to note that the values of f(x) are not themselves proba- 
bilities; they define areas which give probabilities. The values of f(x) 
must be positive, but they can be greater than one as in this example. 
For example, f(.04) = 560(.04) = 22.40. This value of 22.40 cannot be 
a probability, but 


041 
P(039 < = < 041) = jJ 5602 dz = .0448. 
.039 


The cumulative distribution function F(x) must be calculated in pieces. 


т 
F(z) = P(0 < X < 2) = I 560u du = 28022,0 < x < .05 
0 


F(.05) = .70 


I 


Е(х) = P(0 < X < х) = 70+ (— 150и + 3.75)du 
.05 
= —7.5x? + 3.75x 4.53125, 05 < х < 25 


The graph of F(x) for 0 € x < .25 is pictured below. 


Interest Rate Cumulative Distribution Function 


1.00 4 
0.80 4 
^ 0.60 4 


A 


*« 040 4 


0.20 J 


0.20 


Note that even though f(x) is not continuous, F(x) is continuous. 
However, F(x) is not differentiable everywhere, since F'(x) is not 
defined at .05. Values of F(x) are probabilities and must be in the 
interval [0, 1]. 0 
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Technology Note 


The density functions used in this section were simple enough that 
no special help was needed to integrate them. In later sections we will 
deal with more complex density functions which must be integrated 
numerically. The TI-83, TI 89 or TI-92 calculators will do those inte- 
grals for us. 

The piecewise function in this section was not demanding, but it 
required a tedious calculation. Piecewise functions can be defined on the 
TI 89 or TI-92 using the “when” operator. Once this is done, calculations 
can be done more rapidly. For example, the author found F(x) for the 
piecewise function in Example 7.4 with a single integration statement on 
the TI-89. 


7.0 Тһе Mode, the Median, and Percentiles 


In Chapter 4, we looked at two measures of central tendency for discrete 
random variables: the mean and the mode. We will look at the mean of a 
continuous random variable in Section 7.3. In this section, we will look 
at the mode of a continuous random variable and introduce another 
commonly used measure of central tendency, the median. 

For a discrete random variable, the mode was defined to be the 
value of x for which the probability p(x) was highest. For a continuous 
random variable, we look at the density function f(x). 


Definition 7.2 The mode of a continuous random variable is the 
value of x for which the density function f(x) 1s a maximum. 


Example 7.5 In Example 7.1, we looked at X, the percentage 
return on an investment. The density function was 


f(x) = nt cu^ —1<т< І 


0 otherwise 


f(x) is maximized when x = 0, so the mode is 0. О 


Example 7.6 In Example 7.4, we looked at a variable interest rate 
whose density function f(z) was defined piecewise. The maximum 
value of f(x) occurred at x = .05. The mode is .05. r1 
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Example 7.7 Let X be the random variable for the value of a 
number picked at random from [0, 1]. Then 


1 0<х<1 
у= { 0 otherwise ` 
f (x) is constant on [0, 1] and does not have a unique maximum. Any x 
in the interval [0, 1] is a mode. QO 


Definition 7.3 The median m of a continuous random variable X 
is the solution of the equation 


F(m) = P(X € m) = .50. (7.5) 


Example 7.8 The loss severity distribution in Example 7.2 had 
the following density and cumulative distribution functions. 


f(a) = [902 0000027 05 x < 1000 
у= Ет otherwise 


т 
F(z) = f (.002 — .000002z)du = .002x — .0000012?, 
0 
0 «X х < 1000 


The median m can be found by solving F(m) — .50 for m. 
.002m — .000001m? = .50 


The solution to this quadratic equation, in the interval [0,1000], is 
m = 292.89. This has a nice intuitive interpretation. Half of all losses 
will be less than 292.89; the other half will be greater. Note that the 
mode of this distribution is 0. The median and the mode are not 
necessarily equal. O 


If the density function is symmetric, the median can be found 
without calculation. For example, if X is a random number chosen from 
[0,1], the median is clearly m = .50. If X is the random variable of 
investment returns in Example 7.1, the density function graph is sym- 
metric about 0. 
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Investment Density Function 


It should be clear from the graph that m = 0. 

For the loss severity example, the median could be interpreted as 
separating the top 50% of losses from the bottom 50%. For this reason, 
the median is called the 50^ percentile. Other percentiles can be 
defined using similar reasoning. For example, the 90%" percentile 
separates the top 10% from the bottom 90%. Percentiles are defined in 
general in the next definition. 


Definition 7.4 Let X be a continuous random variable and 
0 < p< 1. The 100p/^ percentile of X is the number х, defined by 


F(z,)- p. 


Example 7.9 The 90°” percentile of the loss severity distribution 
is found by solving 
.002z 99 — .0000012z2,, = .90. 


The solution in the interval [0, 1000] is = оо = 683.77. m 
The median and percentiles are more difficult to find for piecewise 


densities, since one must first find which piece contains the median or 
the desired percentile. This will be necessary in Exercise 7-7. 
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7.3 The Mean and Variance of a Continuous Random 
Variable 

7.3.1 The Expected Value of a Continuous Random Variable 


In Chapter 4, the expected value of a discrete random variable X was 
defined as 


E(X) = X z- p(2). 


Using the integral as a continuous sum, we can similarly define the 
expected value of a continuous random variable X. 


Definition 7.5 Let X be a continuous random variable with 
density function f(x). The expected value of X is 


E(X) = ] Addo (7.6) 


E(X) is also denoted by џ, and referred to as the mean of X. 


Example 7.10 Let X be the loss severity random variable from 
Example 7.2. 


ў .002—.0000027 0 < х < 1000 
у= otherwise 


1000 
E(X) = | (.002z — .00000222) dz = 1000 231333. D 
0 


Note that the mean is not equal to the median for the loss severity 
distribution. (The median is approximately 292.89.) This illustrates that 
the mean and median are not necessarily equal. The next example 
illustrates a case where the two are equal. 


Example 7.11 Let X be a number chosen at random from [0, 1]. 


1 
Е(Х)= | z- Idz = 50 L1 
0 


188 Chapter 7 


The mean equals the median for the random number X. The reader 
will be asked to show in Exercise 7-10 that for the random variable of 
investment values in Example 7.1, the mean equals the median of 0. The 
mean will equal the median when the graph of the density function is 
symmetric. 

Finding the mean when the density function is defined piecewise 
requires a bit more calculation. 


Example 7.12 The interest rate random variable in Example 7.4 
had density function 


560x 0c z«.05 
f(z)—-4 —15r +3.75 .05 « x < 25. 
0 otherwise 


25 


.05 
E(X) = | 56027 dx + (—152? + 3.752) dx 
0 05 


= 0233+ .035 = .05833 m 


7.3.2 The Expected Value of a Function of a Random Variable 
Suppose X is a random variable, but we are actually interested in the 


random variable g(X). In Section 6.1 we discussed how to find E[g(.X)] 
if X is discrete with probability function р(х), 


Е[90Х)] = У 79() - р(х). 


The result for continuous random variables is similar, with summation 
replaced by integration. 


Expected Value of a Function of a Continuous Random Variable 
X continuous with density function f(z) 


Elg(X] = Í g(a) - f(a) de (7.7) 


Dealing with functions of random variables can be tricky. We will 
not give a proof of Equation (7.7) here, but we will discuss finding the 
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density function for g(X) in a later section. At this point, we will con- 
centrate on applying Equation (7.7). One common application occurs 
when g(x) = az + b. 


E[g(X)] = [ext fade = а ^s fan eh sede 


—a-E(X)+b-1 


Thus for any discrete or continuous random variable X, 


E(aX +b) = a. E(X) 4 b. 


Example 7.13 Let X be the loss severity random variable of 
Example 7.2. In Example 7.10 we showed that E(X) — 333. 33. The 
random variable is the amount of loss on one policy in the next year. 
Suppose that next year is 1999, but you also wish to project costs Y for 
the year 2000. You believe that costs will inflate by 5% for the year 
2000. Then the inflated cost for the year 2000 is Y = 1.05Х, and 


E(Y) = E(1.05X) = 1.05 - E(X) = 350. o 


We will use Equation (7.7) in many applications throughout this 
chapter. In the next section, we will use it in the definition of the vari- 
ance of a continuous random variable. 


7.3.3 The Variance of a Continuous Random Variable 

In Chapter 4 we defined the variance of a discrete random variable to be 
EX — py]. This expectation also defines the variance of a continuous 
random variable, but the expectation is calculated using integration 


instead of summation. 


Definition 7.6 Let X be a continuous random variable with 
density function f(x) and mean и. Then the variance of x is defined by 


V(X) = EX — iy] = 2 вну: fede. — (79) 
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The square root of the variance is called the standard deviation and 
denoted by the Greek letter sigma. 


e = /V(X) 
в? = V(X) 


Example 7.14 Let X be a number chosen at random from [0, 1]. 
In Example 7.11, we showed that E( X) — .50. Then 


VOO = m - 50ў1= f (s- 3) 122 = р. п 
0 


In Chapter 6 we showed that for a discrete random variable X 
V(X) = Е(Х?) - [EXP = E(X?)-,4). (7.10) 
This result can also be derived for continuous random variables. 


BUX - ay] = j (2? — 2uz + p2)> fla)de 


= [2 fede | а remise [reas 


E(X?) — 2p n c py? -1 = Е(Х?у— р? 


We noted in Chapter 6 that Equation (7.10) is often preferred for 
calculations that must be done by hand. The definition of variance in 
Equation (7.9) gives a calculation method which avoids certain round- 
off error problems, and is preferred for computer solutions. In the next 
example we illustrate how Equation (7.10) might be used to shorten 
computation time for a traditional hand calculation. 


Example 7.15 Let X be the loss severity random variable of 
Example 7.2. We showed in Example 7.10 that 


E(X) = 1990 — 33333. 


In order to use Equation (7.10), we need only calculate E(X?). 
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1000 
E(X?) = f 2^(.002 — .000002z) dr = 166,666.66. 
0 


V(X) = 166,666.66 — 333.35 = 300,000 


= 55,555.55 


Calculation of V(X) from the defining Equation (7.10) would require 
evaluation of the integral 


1000 2 
n (= = 1999) (.002 — .000002z) dz. 
0 


This calculation is straightforward, but much more time-consuming if 
done by hand. If the calculation is done on a computer or powerful 
calculator, calculation time is not an issue. Г] 


We have already used Equation (7.7) to derive the expected value 
of a linear function of a continuous random variable X, which was 
E(aX + Б) = а: E(X)+ = ay + b. We can also derive a formula for 
V(aX + Б). If Y = aX + b, then 

Y — E(Y) = aX +b- (apd) = a(X — и). 
Then 


V(Y) = ERY — E(Y))] = E[a (X — ny] = à? - E[(X — uy] 
a? . V(X). 


V(aX +b) = а. V(X) (7.11) 


The expressions for E(a.X + b) and V (aX + b) derived here for contin- 
uous random variables are identical with those derived earlier for 
discrete random variables. 


Example 7.16 In Example 7.13, we looked at the effect of 5% 
inflation on the loss severity random variable X. The random variable 
for loss severity after inflation was Y = 1.05 X. In Example 7.15 we 
showed that V(X) = 55,555.55. Then 


V(Y) = V(1.05X) = 1.052(55,555.55) = 61,250. o 
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7-1. 


7-3 


7-6. 


7-7. 
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Exercises 


Defining a Continuous Random Variable 


Let f(x) = 1.52 + .25, for 0 € x < 1, and f(x) = 0 elsewhere. 
(a) Show that f(x) is a probability density function. 

(b) What is the cumulative distribution function? 

(c) Find PO € X < 3) and P(; € X < i). 


Let f(x) = a(e^?* — e-?*), for z > 0, and f(x) = 0 elsewhere. 
(a) Find а so that f(x) is a probability density function. 
(b) Whatis P(X < 1)? 


Let 
25x 0€ zx < .20 
f(z) = 4 L5625(1— х) 20<2< 1. 
0 elsewhere 


Find P(.10 < X < .60). 

Let f(x) = a/(1 + 22), for x > 0, and f(x) = 0 elsewhere. 

(a) Finda so that f(x) 1s a probability density function. 

(b Whatis P(X < 1)? 

The Mode, the Median, and Percentiles 

For the density function in Exercise 7-1, find z 25, x 50 and z 75. 
Let f(x) = e*, for 0 < x € In2, and f(x) = 0 elsewhere. 

(a) Find Ж 50 апа T 90. 


(b) What is the mode of this distribution? 


For the density function in Exercise 7-3, find the median and 
2.80. 
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7.3 


7-8. 


7-9. 


7-10. 


7.5 


7-12. 


7-13. 


The Mean and Variance of a 
Continuous Random Variable 


If X is the random variable whose density function is defined in 
Exercise 7-1, what are E(X) апа V(X)? 


If X is the random variable whose density function is defined in 
Exercise 7-3, what is E(X)? 


For the random variable in Example 7.1 whose density function 
is f(x)2.75(1-x?), for -1<x<1, and f(x)20 elsewhere, 
show that both the mean and the median are equal to 0. 


Let X be a random variable whose density function is a 


for x 20, and 0 elsewhere (Exercise 7-4). Show that E(X) 
does not exist. 


Sample Actuarial Examination Problems 


The lifetime of a machine part has a continuous distribution on 
the interval (0,40) with probability density function f, where 


f (x) is proportional to (10 + xj 


Calculate the probability that the lifetime of the machine part is 
less than 6. 


An insurer's annual weather-related loss, X, is a random variable 
with density function 


2.5(200)25 for x» 200 
fe)-4 2 


0 otherwise 


Calculate the difference between the 30" and 70" percentiles of X. 
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7-14. An insurance company’s monthly claims are modeled by a 


7-15. 


7-17. 


continuous, positive random variable X, whose probability 
density function is proportional to (1+х) * where 0 « x « o. 


Determine the company's expected monthly claims. 


Let X be a continuous random variable with density function 


|| 
f(x) = 410 


0 otherwise 


for —2<x<4 
Calculate the expected value of X. 


The loss due to a fire in a commercial building is modeled by a 
random variable X with density function 


.005(20-x) for 0«x«20 
0 otherwise 


ra=] 


Given that a fire loss exceeds 8, what is the probability that it 
exceeds 16? 


An insurance company insures a large number of homes. The 
insured value, X, of a randomly selected home is assumed to 
follow a distribution with density function 


3x74 for x»l 


0 otherwise 


f(x) -| 


Given that a randomly selected home is insured for at least 1.5, 
what is the probability that it is insured for less than 2? 


Chapter 8 
Commonly Used Continuous 
Distributions 


8.1 The Uniform Distribution 
8.1.1 The Uniform Density Function 


The uniform distribution is the first of a series of useful continuous 
probability distributions which will be studied in this chapter. It is 
covered first because it 1s the simplest. We have already seen an example 
of a random variable X which has a uniform distribution. In Section 
7.1.1, we looked at X, the value of a number picked at random from the 
interval [0, 1]. The density function was constant (at 1) on the interval 
[0, 1], and 0 otherwise. 


1 O0<2<1 
fo (е otherwise 


The general uniform density function is constant on an interval 


[a, b], and 0 otherwise. To assure that the area bounded by the density 


function and the z-axis is 1, the constant value must be т 4 z: 


Uniform Density Function 
X uniform on [а, 5] 


a<lr<b 


1 
(x = b—a 
fe) | 0 otherwise 
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Uniform Density Function 


The graph of the uniform density function is pictured above. The 
graph shows that 


Example 8.1 A company is expecting to receive payment of a 
large bill sometime today. The time X until the payment is received is 
uniformly distributed over the interval [1,9], sometime between 1 and 9 
hours from now, with all times in the interval being equally likely. The 
density function for X is 


r= Ii ы 


O otherwise 


The probability that the time of receipt is between 2 and 5 hours from 
now is 


РО<Х х5) = 9—2 = 8. o 


8.1.2 The Cumulative Distribution Function 
for a Uniform Random Variable 


Equation (8.2) can be used to find P(X < x) for values of т in the 
interval [a, b]. 
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Р(Х < x)= P(a < Х<х)= =, fra<x<b 


Then the cumulative distribution function F(z) for a uniform random 
variable X on [a, b] can be defined. 


Uniform Cumulative Distribution Function 
X uniform on [a,b] 


0 r«a 
—a 
< 
= а<тх<Ь 
z>b 


Example 8.2 Let X be the random variable for time of payment 
receipt in Example 8.1. X is uniform on [1,9]. The cumulative distribu- 
tion is given by 


0 r«l 
F(z) = zl 1<хтх<9. 
1 т> 9 


As the graph shows, the cumulative distribution function is a straight 
line between a = 1 and b = 9. o 


8.1.3 Uniform Random Variables for Lifetimes; Survival 
Functions 


In many applied probability problems, the random variable of interest is 
a time variable T'. This time variable could be the time until death of a 
person, which is a standard insurance application. However, the same 
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mathematics can be used to analyze the time until a machine part fails, 
the time until a disease ends, or the time it takes to serve a customer in a 
store. The uniform distribution does not give a very realistic model for 
human lifetimes, but it is often used as an illustration of a lifetime model 
because of its simplicity. 


Example 8.3 Let T' be the time from birth until death of a 


randomly selected member of a population. Assume that T has a uniform 
distribution on [0, 100]!. Then 


1 
ann 0<tŁt< 100 
А | o 


otherwise 
and 
0 t«0 
Е@ =< туу 0St< 100. 
1 t > 100 


The function F(t) gives us the probability that the person dies by аре t. 
For example, the probability of death by age 57 is 


P(T < 57) = Е(57) = туу = .57. 
Most of us аге interested іп the probability that we will survive past a 
certain age. In this example, we might wish to find the probability that 
we survive beyond age 57. This is simply the probability that we do not 
die by age 57. 


PT > 57) = 1— Е(57) = 1— фуу = 43 o 


The probability of surviving from birth past a given age t is called 
a survival probability and denoted by S(t). 


Definition 8.1 The survival function is 
S) = PT>t=1-F@). (8.4) 
In the last example, we could have written S(57) = .43. 


1 Actuarial texts refer to this as a de Moivre distribution. 
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8.1.4 The Mean and Variance of the Uniform Distribution 


The mean and variance of the uniform distribution are given below. 


Uniform Distribution Mean and Variance 
X uniform on [a,b] 


E(X) = 244 


2 
va = 20) 


We will discuss the derivation of these formulas at the end of the sec- 
tion. First we will look at some examples. 


Example 8.4 Let X be the payment time in Example 8.1, where 
X is uniform on [1,9]. Then 


EX) = 152 = 5 
and 
2 
V(X) = OTD = И = 5.33. 


Note that the expected value of the uniform X is the midpoint of the 
interval [a, 5]. 0 


Example 8.5 Let Т be the time until death in Example 8.3, where 
T is uniform on [0, 100]. Then 


and 


(100 — 0)2 _ 10,000 
БОБЕР 


Ү(Т) = = 833.33. D 


The formulas for the mean and the variance can be derived by 
integrating polynomials. The mean is derived below. 


е b ND 
F(X) = 1 -1 а Z-a _а+%& 
( LED Zla bza 2 2 


To derive the variance, find E[X?] and use Equation (7.10). This is left 
for the reader in Exercise 8-1. 


200 Chapter 8 


8.1.5 A Conditional Probability Problem Involving the 
Uniform Distribution 


In some problems we are given information about an individual and end 
up solving conditional probability problems based on that information. 
In Example 8.3 we looked at a random variable T' which represented the 
lifetime of a member of a population. If you are a twenty-year-old in that 
population, you are interested in lifetime probabilities for twenty-year- 
old individuals. This requires conditional probability calculations in 
which you are given that an individual is at least twenty years old. 


Example 8.6 Let T be the lifetime random variable in Example 
8.3, where Т is uniform on [0,100]. Find (a) P(T > 50 |T > 20) and 
(b) P(T > т|Т > 20), for x in [20, 100]. 

Solution 

(jc Pessat 20) 2 £0 2590 T > 20) oA 20) 
— PT 2 50) _ 50 _ 625 
PCE = 20) 80 


(b) Ifz is any real number in the interval [20, 100], then 
> > 
pare т» my = збро m 


P(T > х) 
PCT > 20) 


_ l- _ 100-2 
c у ER n у 


The final expression in part (b) is the survival function S(x) for a 
random variable which is uniformly distributed on [20, 100]. This has a 
nice intuitive interpretation. If the lifetime of a newborn is uniformly 
distributed on [0,100], the lifetime of a twenty-year-old is uniformly 
distributed on the remaining interval [20, 100]. m 
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8.2 The Exponential Distribution 
8.2.1 Mathematical Preliminaries 
The exponential distribution formula uses the exponential function 


f(z) =e". It is helpful to review some material from calculus. The 
following limit will be useful in evaluating definite integrals. 


Many applications will require integration of expressions of the 
form z"e “*, from 0 to оо, for positive a. The simplest case occurs when 
n = 0. In this case 


Г ea UE mc aee ОРЕКЕ ш 
Jo 


The 0 term in the evaluation results from Equation (8.6). 
If n = 1, we can use integration by parts with w= х and 
dv = e ** dz to show that 


- — z-e e 
[reds = EE x 4C. 


This antiderivative enables us to show that, for a > 0, 


us -ах —c:- e 97 есе T 
Tre dz = (= = ey) 
Јо Е 0 


Repeated integration by parts сап be used to show that 


= 0-0- (0-4) = h 


OO 
_ ! NL 
| 2" -e “dr = AH for а > бапа na positive integer. (8.7) 
0 a 


Equation (8.7) will be used frequently. It is worth remembering. 

An interesting question is what happens to the integral in Equation 
(8.7) if n is not a positive integer. The answer to this question involves a 
special function T(x) called the gamma function. (Gamma (Г) is a 
capital “G” in the classical Greek alphabet.) The gamma function is 
defined for n > Oby 


202 Chapter 8 


Г(п) = jo -e ах. (8.8) 


Equation (8.7) can be used to show that for any positive integer n, 
Г(п) = (п – 1)!. (8.9) 


The gamma function is defined by an integral, and gives а value 
for any n. If n is a positive integer, the value is (n — 1)!, but we can also 
evaluate it for other values of n. For example, it can be shown that 


If we look at the relation between the gamma function and the factorial 
function in Equation (8.9), we might think of the above value as the 


factorial of l. 
Iq 3\_ La, 


The gamma function will be used in Section 8.3 when we study the 
gamma distribution. It can be used here to give a version of Equation 
(8.7) that works for any n » —1. 


l give бир з= ATD фа 50 апаны <1 (8.10) 
0 


art! 


8.2.2 The Exponential Density: An Example 


In Section 5.3 we introduced the Poisson distribution, which gave the 
probability of a specified number of random events in an interval. The 
exponential distribution gives the probability for the waiting time 
between those Poisson events. We will introduce this by returning to the 
accident analysis in Example 5.14. The mathematical reasoning which 
shows that the waiting time in this example has an exponential distribu- 
tion will be covered in Section 8.2.9. 
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Example 8.7 Accidents at a busy intersection occur at an average 
rate of А = 2 per month. An analyst has observed that the number of 
accidents in a month has a Poisson distribution. (This was studied in 
Section 5.3.2.). The analyst has also observed that the time Т between 
accidents is a random variable with density function 


f (t) = 2e7?., fort > 0. 


The time T' is measured in months. The shape of the density function is 
given in the next graph. 


Exponential Density Function 


The graph decreases steadily, and appears to indicate that the time 
between accidents is almost always less than 2 months. We can use the 
density function to calculate the probability that the waiting time for the 
next accident is less than 2 months. 


2 


zm 2 
PO<T<2)= J 2е72*ӣх = —e 7* = —e +41 .98168 O 
0 


8.2.3 The Exponential Density Function 


The density function in the preceding section was an example of an 
exponential density function. 


Exponential Density Function 
Random variable Т, parameter А 


РО) = Ae" ,, fort > 0 
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This definition of f(t) satisfies the definition of a density function, since 
f (t) => 0 and the total area bounded by the curve and the z-axis is 1.00. 


J re "dt = —e^M = 0—(—1) = 1 
Јо 0 


In many applications the parameter А represents the rate at which 
events occur in a Poisson process, and the random variable Т represents 
the waiting time between events.? A common application of the expo- 
nential distribution 1s the analysis of the time until failure of a machine 
part. 


Example 8.8 A company is studying the reliability of a part in a 
machine. The time T (in hours) from installation to failure of the part is 
a random variable. The study shows that 7' follows an exponential 
distribution with A = .001. The probability that a part fails within 100 


hours is 
-100 100 


PO <T < 100) = / Q01e7 dg = е 90| 
J0 
= е7! +1 .095. o 


If we replace the failure of a part by the death of a human, we can 
apply the exponential distribution to human lifetimes. We will show in 
Section 8.2.10 that the exponential distribution is not a good model for 
the length of a normal human life, but it has been used to study the 
remaining lifetime of humans with a disease. 


Example 8.9 Panjer [13] studied the progression of individuals 
who had been infected with the AIDS virus. Modern treatments have 
greatly improved the treatment of AIDS, and Panjer’s numbers are no 
longer valid for modern patients. However, for the data available in 
1988, Panjer found that the time in each stage of the disease unti! 
progression to the next stage could be modeled by an exponential 
distribution. For example, the time T (in years) from reaching the actual 
Acquired Immune Deficiency Syndrome (AIDS) stage until death could 
be modeled by an exponential distribution with А zz 1/.91. 0o 


2 à might also be described as the average number of events occuring per unit of time. 
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8.2.4 The Cumulative Distribution Function and Survival 
Function of the Exponential Random Variable 


In Example 8.8 we found the probability P(T' < 100). This is F(100), 
where F(t) is the cumulative distribution function. The cumulative 
distribution for any exponential random variable is derived below. 


ot t 
PT <) = | de "dr = е^ = =e ; fort > 0 
Jo 


Exponential Cumulative Distribution and Survival Functions 
Random variable Т, parameter А 


F(t) = 1-е (8.12a) 


S(t) = 1— F(t) = e% (8.12b) 


These simple formulas make the exponential distribution an easy one 
with which to deal. 


Example 8.10 Let Т be the time until failure of the part in 
Example 8.8. T' has an exponential distribution with А = .001. Find 
(a) the probability that the part fails within 200 hours; (b) the probability 
that the part lasts for more than 500 hours. 


Solution 
(a F(200)=1- e ^7? = 181 
(b 5(500) = e °° ~ 607 О 


8.2.5 Тһе Mean and Variance of the Exponential Distribution 


The mean and variance of the exponential distribution with parameter A 
can be derived using Equation (8.7). 


gn - | леи) t-e “dt = xi 
Јо „0 


E(T?) = / 2. Мета! = Ji P. edt = А2, = 2, 
Јо 0 A A 


2 
(Т) = ET- EDP = 2$; - (3) = 52 
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Exponential Distribution Mean and Variance 
Random variable T, parameter А 


E(T) = 


1 
X 
V(T) = x 


Example 8.11 Let Т be the random variable for the time from 
reaching the AIDS stage to death in Example 8.9. T' is exponential with 
А = 1/.91. Then 


Е(Т) = +. = .91 
апа 
V(T) = x = 91? = 8281. п 


Example 8.12 Let Т' be the time to failure of the machine part in 
Example 8.8. T' is exponential with А = .001. Then 


E(T) = i = 1000 
and 
V(T) = x — 1,000,000. o 


Although the part in Example 8.12 has an expected life of 1000 
hours, you might not want to use it for 1000 hours if your life depended 
on it. The probability that the part fails within 1000 hours is 


P(T < 1000) = F(1000) = 1 — e^! = .632. 


It is true for any exponential distribution that F[E(T)] = 1 — e^! = .632. 
The reader is asked to verify this in Exercise 8-14. 


8.2.6 Another Look at the Meaning of the Density Function 


We have mentioned before that density function values are not probabil- 
ities, but rather they define areas which give probabilities. We can illus- 
trate this in a new way by looking at the previous exponential graph 
from Example 8.8. At the time value t we have inserted a rectangle of 
height f(t) with a small base dt. The rectangle area is f(t) dt, and it 
approximates the area under the curve between £ and t+dt. Thus 
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Pt « T <t+dt) = f(t)dt. 


Exponential Density Function 


When f(t) is the density function, f (t) dt represents the probability that 
the random variable T falls in the small interval from t to t+dt. 


8.2.7 The Failure (Hazard) Rate 


We will introduce the failure rate (also called the hazard rate) by return- 
ing to the machine part failure time random variable Т. Since А = .001, 
the survival function is 

S(t) = e 9t. 


This formula is identical with the familiar formula for exponential decay 
at a rate of .001. Thus it is intuitively natural to think of the machine part 
as one member of a population which is failing at a rate of .001 per hour, 
and to refer to .001 as the failure rate of the part. 

The above reasoning is intuitive, but probability theory has a more 
careful definition of the failure rate. 


Definition 8.2 Let Т be a random variable with density function 
f(t) and cumulative distribution function F(t). The failure rate 
function A(t) is defined by 


м) = LI... = I0) (8.14) 
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The failure rate can be defined for any random variable, but is 
simplest to understand for an exponential random variable. For the 
exponential distribution with parameter А, 


Xo) = {© = MT = л. 


Thus our intuitive idea of А = .001 as the failure rate of the machine 
part agrees with the probabilistic definition of the failure rate. To get a 
better understanding of the reasoning behind the definition of the failure 
rate, multiply through the defining equation for A(t) by dt. 
_ Фа _ f(tdt 
Матт 

The numerator f(t) dt is approximately P(t < Т < t+dt). The denom- 
inator is P(T' » t). The quotient of the two can be thought of as a 
conditional probability. 


P(t <T < tdt 
Oat x © Б UD = pa com < ttdt|t < T) 


In words, A(t) dt is the conditional probability of failure in the next dt 
time units for a part that has survived to time t. 

The situation for now is simple. For an exponential distribution, 
the failure rate is constant; it is always equal to A. The same general 
definition of failure rate can lead to much more complicated functions 
for other random variables. The reader 1s asked to derive the failure rate 
function for the uniform distribution in Exercise 8-12. 

When we look at a human being subject to death, instead of a part 
exposed to failure, we think of death as a hazard. In this case, we might 
refer to the failure (death) rate as the hazard rate. In Example 8.9, the 
parameter À — 1/.91 for the exponential distribution of time to death 
would be referred to as a hazard rate. 


8.2.8 Use of the Cumulative Distribution Function 


Once the cumulative distribution F(x) is known for a random variable 
X, it can be used to find the probability that X lies in any interval, since 


3 For continuous distributions, P(a < X < b) = P(a < X < b). For discrete and mixed 
distributions, this will not be the case. 
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Equation (8.15) is true for any random variable X. For the exponential 
random variable, it leads to the simple formula 


P(ac€X€b)yzeoe9-g7 


We have not emphasized the use of technology in Sections 8.1 and 
8.2 because there is little need for it in dealing with the uniform and 
exponential distributions. The probability integrals for uniform probabi- 
lities are rectangle areas, and the cumulative distribution for the 
exponential distribution is a simple exponential expression which can be 
evaluated on any scientific calculator. This situation will change in the 
following sections, where we will see much more complicated density 
functions and integrals which cannot be done in closed form. It is worth 
noting that the exponential distribution is important enough that a 
function for it is included in Microsoft® EXCEL. The function 
EXPONDIST() will calculate values of the cumulative distribution 
function of an exponential random variable. 


8.2.9 Why the Waiting Time is Exponential for Events Whose 
Number Follows a Poisson Distribution 


In Section 8.2.2 we stated that the exponential distribution gave the 
waiting time between events when the number of events followed a 
Poisson distribution. To see why this 1s true, we need to make one more 
assumption about the events in question: Jf the number of events in a 
time period of length 1 is a Poisson random variable with parameter A, 
then the number of events in a time period of length t is a Poisson 
random variable with parameter At. 

This is a reasonable assumption. For example, if the number of 
accidents in a month at an intersection is a Poisson random variable with 
rate parameter \ = 2, then the assumption says that accidents in a two- 
month period will be Poisson with a rate parameter of 2 = 4. 

Using this assumption, the probability of no accidents in an 
interval of length t is 


—At 0 
P(X = 0) = SGP = ex 


However, there are no accidents in an interval of length ¢ if and only if 
the waiting time 7 for the next accident is greater than t. Thus 


P(X =0)=P(T>H)=S()=e™. 
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This is the survival function for an exponential distribution, so the 
waiting time Т is exponential with parameter А. 


8.2.10 A Conditional Probability Problem Involving the 
Exponential Distribution 


In Section 8.1.5 we looked at a conditional probability problem involv- 
ing the uniform distribution. We can use the same kind of reasoning for 
conditional problems in which the underlying random variable is expo- 
nential. 


Example 8.13 Let T be the time to failure of the machine part in 
Example 8.8, where T is exponential with \ = .001. Find each of (a) 
P(T > 150| Т > 100) and (b) P(T > x+100|T > 100), for z in [0, oo). 


Solution 


Р(Т > 150 and T > 100) 
T > 100) 

_ P(T > 150) 

^ P(T > 100) 


~.001(150) 
— © 205 5 
= 22000100) = E f 951 


(a) P(T > 150| T > 100) = 


(b) If zx is any real number in the interval [0, оо), then 


P(T > x + 1007 > 100) = EC Z 2100 and T 2 100) 


_ P(T > x + 100) 
= PT210) 
e :001(2+ 100) 


e001 (100) 


e lx. 


The final expression in part (b) is the survival function S(x) for a 
random variable which is exponentially distributed on [0,со) with 
А = .001. This has a nice intuitive interpretation, since we can think of x 
as representing hours survived past the 100” hour. If the lifetime of a 
new part is exponentially distributed on [0,00) with A = .001, the 
remaining lifetime of a 100-hour-old part is also exponentially distribu- 
ted on [0, со) with A = .001. The lifetime random variable of the part is 
called memoryless, because the future lifetime of an aged part has the 
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same distribution as the lifetime of a new part. All exponential distribu- 
tions are memoryless. (Exercise 8-18 asks for a proof of this fact.) The 
memoryless property makes the exponential distribution a poor model 
for a normal human life. 1 


8.3 The Gamma Distribution 


In the following sections we will discuss a number of distributions 
which are quite useful in applications. The mathematics for these distri- 
butions is complex, and derivations of most key properties will be left 
for more advanced courses. We will focus on the application of these 
distributions in applied problems. The first of these distributions is the 
gamma distribution. 


8.3.1 Applications of the Gamma Distribution 


In Section 5.4, we showed that the geometric probability function p(z) 
gave the probability of x failures before the first success in a series of 
independent success-failure trials. In Section 5.5 we showed that the 
negative binomial probability function р(х) gave the probability of т 
failures before the r^ success in a series of independent success-failure 
trials. The gamma distribution is related to the exponential distribution 
in a similar way. The exponential random variable T' can be used to 
model the waiting time for the first occurrence of an event of interest, 
such as the waiting time for the next a vident at an intersection. The 
gamma random variable X can be used to model the waiting time for the 
ntt occurrence of the event if successive occurrences are independent. In 
this section, we will use the gamma random variable as a model for the 
waiting time for a total of two accidents at an intersection. The gamma 
distribution can also be used in other problems where the exponential 
distribution is useful; examples include the analysis of failure time of a 
machine part or survival time for a disease. 

There are a number of insurance applications of the gamma distri- 
bution. The distribution has mathematical properties which make it a 
convenient model for the average rate of claims filed by different 
policyholders of an insurance company. (See, for example, page 152 of 
Herzog [4] or page 98 of Hossack et al. [6].) Bowers et al. [2] use a 
translated gamma distribution as a model for the aggregate claims of an 
insurance company. 
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8.3.2 The Gamma Density Function 


The density function for the gamma distribution has two parameters, с 
and P. It requires use of the gamma function, F(x), which was defined 
in Equation (8.8) in Section 8.2.1. The key property of the gamma func- 
tion which will be needed in this section was given by Equation (8.9). 
For any positive integer n, Г(л) = (n-1)!. 


Gamma Density Function 
Parameters a, 8 > 0 


РО) = ате, for xz0 


Note that for a = 1, 


odi 0 -fx _ gox 
f(x) rd" * pe^. 


This is the exponential density function, so the exponential distribution is 
a special case of the gamma distribution. 

The next figure shows the shape of the gamma density functions 
for, B=2 and а - 1, 2 and 4. 


Gamma Density Functions 


The familiar negative exponential curve for æ =1 is clearly visible. For 
the higher values of с, the curve increases to a maximum and then 
decreases. 
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8.3.5 Sums of Independent Exponential Random Variables 


We will state without proof an important theorem which will aid us in 
understanding the application of the gamma distribution. This theorem 
will be proved using moment generating functions in Chapter 11. 


Theorem Let Xi, X2,..., Xn be independent random variables, 
all of which have the same exponential distribution with f(x) = fe ^". 
Then the sum Хү +X +: + X, has a gamma distribution with 
parameters a = n and £. 


Example 8.14 In Example 8.7 we studied T, the time in months 
between accidents at a busy intersection. T was modeled as an 
exponential random variable with parameter 8 = 2. Т represents the 
waiting time for the first accident after observation begins. If we assume 
that accidents occur independently, it is natural to assume that once the 
first accident occurs we will again have an exponential waiting time with 
В = 2 for the second accident. The total waiting time from the start of 
observation will be the sum of the waiting time for the first accident and 
the waiting time from the first accident until the second. In the notation 
of the preceding theorem, 


X, is the waiting time for the first accident, 
Xa is the waiting time between the first and second accidents, 
and, in general, 
X; is the waiting time between accidents i — 1 and i. 


Then 


the total waiting time for accident n. For example, X = X, + X; is the 
random variable for the waiting time from the start of observation until 
the second accident. According to the theorem, X has a gamma distribu- 
tion with parameters œ = 2 and {3 = 2. The density function is 
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Its graph was given in the previous figure. We can now use this density 
function to find probabilities. For example, the probability that the total 
waiting time for the second accident is between one and two months is 


2 
Р(1<Х < 2) = f 4z-:e C dz. 
1 


Using integration by parts, we can evaluate this as 


2 
—2r- e ?* __ e?* 
1 


= 3e^? — 5е7* = 314. Г] 


8.3.4 The Mean and Variance of the Gamma Distribution 


The mean and variance of the gamma distribution can be derived using 
Equation (8.10). This is left for the exercises. 


Gamma Distribution Mean and Variance 
Parameters a, 8 > 0 


Example 8.15 Let X = Х + X» be the random variable for the 
waiting time from the start of observation until the second accident in 
Example 8.14. X has a gamma distribution with о = 2 and 8 = 2. Then 


and 
eid 
= 5. O 


Example 8.16 Let Y = Xı + X2 + Хз + X4 be the random 
variable for the waiting time from the start of observation until the fourth 
accident in Example 8.14. Y has a gamma distribution with a = 4 and 
B = 2. Then 


and 
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8.3.5 Notational Differences Between Texts 


Probability textbooks are divided on notational issues. Many textbooks 
follow our presentation for the gamma distribution. Others replace 8 by 
1/B, giving the alternate formulation 


a-lg-z/f 


for the density function. This version leads to E(X)=af and 
V(X) = off. This alternate formulation may also be used for the 
exponential distribution. The reader needs to be aware of this difference 
because different versions may be used in different applied studies. 


Technology Note 


Technology is very helpful when working with the gamma distri- 
bution, since integrating the gamma density function can be quite tedious 
for most values of a and 8. Consider, for example, the gamma random 
variable Y = X; + X; + Хз + X4 with parameters a = 4 and 8 = 2 
from Example 8.16. The density function is 


lg-23-.— Bade. 


24 
f(x) = Ta" 


To find the probability P(1 € Y < 2), we must evaluate the integral 
2 
Р(1<Ү < 2) = / Spe de. 
І 


This сап be done by repeated integration by parts, but that is time 
consuming. The TI-83 calculator can approximate this integral in a few 
seconds using the function fnInt. It gives the answer .42365334. The TI 89 
or TI-92 will rapidly do the integration by parts exactly. Each calculator 
gives the answer 


(19e? — 7e * 
ay, 


This exact value approximated to eight places leads to the same answer 
given by the TI-83. 
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Microsoft® EXCEL has a function GAMMADIST which will 
calculate values of the gamma cumulative distribution function. (Para- 
meters must be entered in the alternative format of Section 8.3.5.) For 
the random variable Y, EXCEL gave the values 


F(2) = .56652988 and F(1) = .14287654. 
This gives the same answer to our problem. 


Pü «Y < 2) = Е(2) – F(1) = 42365334 


The reader may have noted that in this section the values of а and 
В were integers in all examples. This was done only for computational 
simplicity. The parameters œ and may assume any non-negative real 
values. Technology will enable us to find probabilities for any gamma 
random variable. This is important. For example, the Chi-square random 


variable used in statistical work is a gamma random variable with 8 = 1 
апа а = 9, for some non-negative integer n. 


8.4 The Normal Distribution 
8.4.1 Applications of the Normal Distribution 


The normal distribution is the most widely-used of all the distributions 
found in this text. It can be used to model the distributions of heights, 
weights, test scores, measurement errors, stock portfolio returns, 
insurance portfolio losses, and a wide range of other variables. A classic 
example of the application of the normal distribution was a study of the 
chest sizes of 5732 Scottish militiamen in 1817. (This study is nicely 
summarized in Weiss [18].) An army contractor who provided uniforms 
to the military collected the data for planning purposes. The histogram of 
chest sizes 1s shown in the next figure. 
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Chest Size of Scottish Militiamen 


We can see a pattern to the histogram. The pattern is the shape of the 
normal density curve. The next figure shows the histogram with a 
normal density curve fitted to it. 


A wide range of natural phenomena follow the symmetric pattern 
observed here.4 People often refer to the normal density curve as a 
“bell-shaped curve." The normal curve for the chest sizes is shown 
below without the histogram so that its bell shape can be seen more 
clearly. 


Normal Density Function 


^ We will see why the normal curve is so widely applicable when we discuss the Central 
Limit Theorem in Section 8.4.4. 
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Every normal density curve has this shape, and the normal density 
model is used to find probabilities for all of the natural phenomena 
whose histograms display this pattern. Random variables whose histo- 
grams are well-approximated by a normal density curve are called 
approximately normal. The distribution of chest sizes of Scottish mili- 
tiamen is approximately normal. 


8.4. The Normal Density Function 
The normal density function has two parameters, and c. The function 


is difficult to integrate, and we will not find normal probabilities by 
integration in closed form. 


Normal Density Function 
Parameters и and с 


G-p? 
1 em, for—co< z < оо (8.18) 


It can be shown that и = E(X) and o? = V(X). (Derivations of E(X) 
and V (X) will be given in Section 9.2.3.) 


Normal Distribution Mean and Variance 
Parameters и and с 


E(X) = џ 
V(X) = о? 


Example 8.17 The chest sizes of Scottish militiamen in 1817 
were approximately normal with и = 39.85 and с = 2.07. The density 
function is graphed in the preceding figure. O 


Example 8.18 The SAT aptitude examinations in English and 
Mathematics were originally designed so that scores would be approxi- 
mately normal with и = 500 and с = 100. О 


Note that in each of the previous examples we gave the value of 
the standard deviation o rather than the variance o*. This is the usual 
practice when dealing with the normal distribution. 
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8.4.3 Calculation of Normal Probabilities; The Standard 
Normal 


Suppose we are looking at a national examination whose scores X are 
approximately normal with и = 500 and o = 100. If we wish to find the 
probability that a score falls between 600 and 750, we must evaluate a 
difficult integral. 


P(600 < X < 750) a 1 Е ux 
m — e 20000 dr 
RE өю v/2z - 100 


This cannot be done in closed form using the standard techniques of 
calculus, but it can be approximated using numerical methods. We did 
this using the fnInt operation on the TI-83 calculator, and found that the 
answer was approximately .152446. 

We will discuss use of technology in more detail at the end of this 
section. Until recently, numerical integration was not readily available to 
most people, so another way of finding normal probabilities involving 
tables of areas for a standard normal distribution was developed. It is 
still the most common way of finding normal probabilities. In the rest of 
this section we will cover this method, and the basic properties of 
normal distributions which are behind it, in a series of steps. We begin 
with an important property of normal distributions which is stated with- 
out a complete proof. 


Step 1: Linear transformation of normal random variables. 
Let X be a normal random variable with mean p and standard deviation 
c. Then the transformed random variable Y = aX + b is also normal, 
with mean ан + b and standard deviation |а|с. 
The crucial statement which is not proved here is the assertion that Y is 
also normal. This will be proved using moment generating functions in 
Section 9.2.3. We can easily derive the mean and variance of Y. 

E(aX +b)=a-E(X)+b=ap4+b6 
V(aX +b) 2a?-V(X) = ао? 


сү = v а?а? = |alo 
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Step 2: Transformation to a standard normal. Using the linear 
transformation property of normal random variables, we can transform 
any normal random variable X with mean џ and standard deviation c 
into a standard normal random variable Z with mean 0 and standard 
deviation 1. The linear transformation that is used to do this is 


Note that this is the transformation used to define the z-score in Section 
4.4.4. The linear transformation property tells us that Z is normal, with 


E(Z) = LE(X)-# = 0 


and 


OZ 
The standard normal random variable Z has a density function 
which is somewhat simpler in appearance. This density function still 


requires numerical integration, but it will be the only density function we 
need to integrate to find normal probabilities. 


Standard Normal Density Function 


Parameters и = 0 and o? = ос = 1 


/(&) = е for -оо < z < oo 


The density function for the distribution of Z is shown іп the next 
figure. 
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Standard Normal Density Function 


Step 3: Using z-tables. Tables of areas under the density curve 
for the distribution of Z have been constructed for use in probability 
calculations. In Appendix A, we have provided a table of values of the 
cumulative distribution function for Z, Fz(z) = P(Z < z). The left 
hand column of the table gives the value of z to one decimal place and 
the upper row gives the second decimal place for z. The areas Fz(z) are 
found in the body of the table. Below we have reproduced a small part of 
the table and highlighted the key points for finding the value 
Е;(1.28) = .8997. 


Second Decimal Place in z 


z | 000 0.001 0.02 0.00 0.04 0.05 0.06 0.07 908. 0.09 
0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 
0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 
0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 
0.6217 06255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 
0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 
0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 
0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 
0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 
0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 
0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 
0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 
0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 
0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.9015 
0.9049 0.9066 0.9082 0.9099 09115 0.9131 0.9147 0.9162 0.9177 
0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 


The table tells us that 


P(Z < 128) = Fz(1.28) = .8997. 
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Using the negation rule, we see that 
P(Z > 1.28) = 1 — .8997 = .1003. 


We can also calculate the probability that Z falls in an interval. 
For example, 


P(1 < Z < 2.5) = Fz(2.50) — Fz(1.00) = .9938 — .8413 = .1525. 


Step 4: Finding probabilities for any normal X. Once we 
know how to find probabilities for Z, we can use the transformation 
given by Equation (8.20) to find probabilities for any normal random 
variable X with mean yz and standard deviation с, using the identity 


Plo < X < т) = Р(®—# < ХА < BEF) = Pe < 2 < а), 


С Bec 


1 


Lj—p 22— 
where z; = = and z = Pian Ln 


с 


Example 8.19 The national examination scores X in Example 
8.18 were normally distributed with и = 500 and ос = 100. Then the 
probability of a score in the interval [600, 750] is 


P(600 < X < 750) = p (900,300 < Х-500 < 1505820) 


= P(1 < Z < 2.5) 
= Ё;(2.50) — Fz(1.00) 
= .9938 — .8413 = .1525. 
We might also calculate 
P(X < 600) = Fz(1.00) = .8413, 
P(X < 400) = Fz(—1.00) = .1587, 


and 


P(X > 750) = 1 — Fz(2.50) = 1 — .9938 = .0062. o 


Commonly Used Continuous Distributions 223 


The observant reader will note that we previously calculated the 
probability P(600 < X < 750) by numerical integration of the density 
function and got an answer of .1524, not the .1525 found above. Each z- 
value is rounded to two places and each entry in the table is rounded to 
four places. This rounding can produce small inaccuracies in the last 
decimal place of answers found using the tables. 


Example 8.20 The chest sizes of Scottish militiamen in 1817 
were approximately normally distributed with и = 39.85 and с = 2.07. 
Find the probability that a randomly selected militiaman had a chest size 
in the interval [38, 42]. 

Solution 


Р(38 < X < 42) = p(385738 < 3985 < Sys) 


= P(-0.89 < Z < 1.04) 


F7(1.04) — Fz(—0.89) 


.8508 — .1867 = .6641 Cl 


Technology Note 


Calculation of normal probabilities using Z-tables is not as quick 
or convenient as direct calculator use. The probability P(38 € X < 42) 
from Example 8.20 can be done in seconds on the TI-83, which has a 
special function for normal probabilities. The function, normalcdf, is 
found in the DISTR menu. Entering 


normalcdf(38, 42, 39.85, 2.07) 


will give the answer .6648 to 4 places. Note that this answer is not 
identical with the less-accurate answer obtained from table use. If we 
wish an independent check on this answer, we could use the TI-92 to do 
the integral 


4 1 _ (r-39.85)? 
Р(38< X < 42) = ————е 2092) dm. 
38 V 2T -2.07 
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The answer is .6648 to four places. The calculator is using numerical 
methods to approximate the probability to a higher degree of accuracy 
than is possible using the tables. 

Microsoft® EXCEL has a NORMDIST() function which will 
calculate values of either the density function f(x) or the cumulative 
distribution function F(x). Using EXCEL, 


P(38 < X < 42) = F(42) — F(38) = .8505 — .1857 = .6648. 


Although modern technology is quicker and more accurate than 
use of z-tables, we will continue to find normal probabilities using the 
table method in this text. The old method is so widely used that it must 
be learned for use in standardized examinations which do not allow 
powerful calculators, and for use in other probability and statistics 
courses. 


z-scores are useful for purposes other than table calculation. In 
Chapter 4 we observed that a z-value gives a distance from the mean in 
standard deviation units. Thus for the national examination with 
и = 500 and o = 100, a student with an exam score of x = 750 and a 
transformed value of z = 2.5 can be described as being “2.5 standard 
deviations above the mean.” This is a useful type of description. 


8.4.4 Sums of Independent, Identically Distributed, 
Random Variables 


Sums of random variables will be fully covered in Chapter 11. A brief 
discussion here may help the reader to have a greater appreciation of the 
usefulness of the normal distribution. We will use the loss severity 
random variable X of Examples 7.2, 7.10 and 7.15 to illustrate the need 
for adding random variables. The random variable X represented the 
loss on a single insurance policy. It was not normally distributed. We 
found that 


E(X) = 1090 and V(X) = 


500,000 
9 . 

We also found probabilities for X. However, this information applies 

only to a single policy. The company selling insurance has more than 

one policy, and must look at its total business. Suppose that the company 
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has 1000 policies. The company is willing to assume that all of the 
policies are independent, and that each is governed by the same (non- 
normal) distribution given in Example 7.2. Then the company is really 
responsible for 1000 random variables, Xj, X2, ..., Xj009- The total 
claim loss S for the company is the sum of the losses on all the 
individual policies. 


S = xX, + Xo +- + X1009 


There is a key theorem, called the Central Limit Theorem, which 
shows that this important sum is approximately normal, even though the 
individual policies X; are not. 


Central Limit Theorem Let Х|, X», ..., X, be independent 
random variables, all of which have the same probability distribution and 
thus the same mean џ and variance c?. If n is large?, the sum 


5= Xp Xi b Xs 


will be approximately normal with mean ny and variance no?. 


This theorem shows that the total loss S = X, + X2 + --- + X1000 
will be approximately normal with mean and variance equal to 1000 
times the original mean and variance. 


E(S) = 1000. 1990 — ys) = 1000. 


500,000 
3 9 


This means that even though the original single claim distribution is not 
normal, the normal distribution probability methods can be used to find 
probabilities for the total claim loss of the company. Suppose the 
company wishes to find the probability that total claims 5 were less that 
$350,000. We know that S is approximately normal, and the calculations 
for E(S) and V(S) show that 


Hs = 333,333.33 and es = 7453.56. 


5 How large n must be depends on how close the original distribution is to the normal. 
Some elementary statistics books define л > 30 as “large”, but this will not always be the 
case. 
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Then we can use Z-tables to find 


7453.56 7453.56 
= P(Zx224) = Fz (2.24) = .9875. 


P(S < 350,000 = p(5=233.339.33) < 350,000 — 333,333.33 


This shows the company that it is not likely to need more than $350,000 
to pay claims, which is helpful in planning. In general, the normal 
distribution is quite valuable because it applies in so many situations 
where independent and identical components are being added. 

The Central Limit Theorem enables us to understand why so many 
random variables are approximately normally distributed. This occurs 
because many useful random variables are themselves sums of other 
independent random variables. 


8.4.5 Percentiles of the Normal Distribution 


The percentiles of the standard normal can be determined from the 
tables. For example, 


P(Z <1.96) =.975 
Thus the 97.5 percentile of the Z distribution is 1.96. 


The 90", 95" and 99" percentiles are often asked for in problems. They 
are listed for the standard normal distribution below. 


Z 0.842 | 1.036 | 1.282 | 1.645 | 1.960 | 2.326 | 2.576 
P(Z<z) | 0.800 | 0.850 | 0.900 | 0.950 | 0.975 | 0.990 | 0.995 


If X is a normal random variable with mean рапа standard deviation с, 


then we can easily find xp, the 100p" percentile of X, using the 100p" 
percentile of Z and the basic relationship of X and Z. 


Zp = —— > Xp = +20. 


For example, if X is a standard test score random variable with mean 
и = 500 and standard deviation с =100, then the 99" percentile of X is 


X99 =M+Z990 = 500+2.326(100) = 732.6. 
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8.4.6 The Continuity Correction 


When the normal model is used to approximate a discrete distribution 
(such as integer test scores), you might be asked to apply the continuity 
correction. This is covered in detail in basic statistics courses. 


If you are finding P(a< X <b) for a normal random variable X, the 
continuity correction merely decreases the lower limit by 0.5 and raises 
the upper limit by 0.5. Suppose, for example, that for the test score 
random variable in example 8.20 you wanted to find the probability that 
a score was in the range from 600 to 700. Without the continuity 
correction you would calculate: 


_ »{ 500-500 700 — 500 
P(500< X < 700) = P| oA ) 


P(0<Z <2) = .9772-.5 = 4772 


With the continuity correction you would calculate 


[499.5500 700.5 — 500 
P(499.5 < X < 700.5) = p( 490.5200 < z «7005—3500 ) 


P(-.005 < Z < 2.005) 


Your tables for Z do not go to three places. If you rounded to two places 
you would get 


P(—.01 < Z < 2.01) = .9778 —.4960 = .4818 


In this example the use of the continuity correction would make no 
difference in your final answer if exam choices are rounded to two places 
—each method would give you .48. You should use the continuity 
correction if you are instructed to in an exam question or if o is small 
enough that the change of .5/ с would change the second place in your 
Z-score. 


? you can review the continuity correction in introductory texts such as 
Introductory Statistics, (Seventh edition) by Neil Weiss, Pearson Addison- 
Wesley 2005. 
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8.5 The Lognormal Distribution 
8.5.1 Applications of the Lognormal Distribution 


Although the normal distribution is very useful, it does not fit every 
situation. The normal distribution curve is symmetric, and this is not 
appropriate for some real phenomena such as insurance claim severity or 
investment returns. The lognormal distribution curve has a shape that 
is not symmetric and fits the last two phenomena fairly well. The next 
figure shows the lognormal curve for a claim severity problem which 
will be examined in Example 8.21. 


Lognormal Density Function 


2000 3000 


This curve gives the highest probability to claims in a range around 
z — 1000, but does give a non-zero probability to much higher claim 
amounts. 

The use of the lognormal distribution as a model for claim severity 
in insurance is discussed by Hossack et al. [6]. The reader interested in 
using the lognormal to model investment returns should see page 187 of 
Bodie et al. [1], or page 281 of Hull [7]. 


8.5.2 Defining the Lognormal Distribution 


A random variable is called lognormal if its natural logarithm is 
normally distributed. This is said in a slightly different way in the usual 
definition of the lognormal. 


Definition 8.3 A random variable Y is lognormal if Y — e* for 
some normal random variable X with mean и and standard deviation о. 
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Example 8.21 Let X be a normal random variable with jj = 7 and 
с = 0.5. Y = е^ is the lognormal random variable whose density curve 
is shown in the last figure. The shape of the curve makes it a reasonable 
model for some insurance claim analyses. L1 


The density function of a lognormal distribution 15 given below. 


Xx 


Density Function for Lognormal Y — e 
X normal with mean у and standard deviation с 


(8) for , > 0 


их 1 
f(y) = Бегзи 


This function is difficult to work with, but we will not need it. We will 
show how to find lognormal probabilities using normal probabilities in 
Section 8.5.3. 

Note that the parameters и and o represent the mean and standard 
deviation of the normal random variable X which appears in the expo- 
nent. The mean and variance of the actual lognormal distribution Y are 
given below. 


Mean and Variance for Lognormal Y = е^ 


X normal with mean и and standard deviation с 


EY) = e$ (8.232) 


V(Y) = et? (e — 1) (8.23b) 


Example 8.22 Let X be a normal random variable with и = 7 and 
o = 0.5, and let Y = e* as in the Example 8.21. 


52 


E(Y) = eF a 1242.65 
V(Y) = e*0*95' (605° — 1) хы 438,584.80 


If we think of Y as a model for insurance claim amounts, the mean claim 
amount is $1,242.65. L1 
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8.5.3 Calculating Probabilities for a 
Lognormal Random Variable 


We do not need to integrate the density function for the lognormal 
random variable Y. The cumulative distribution function can be found 
directly from the cumulative distribution for the normally distributed 
exponent X. 


Fy(c) = P(Y < с) = P(e* < c) = Р(Х € Inc) = Fx(Inc) 

Example 8.23 Suppose the random variable Y of Examples 8.21 
and 8.22 is used as a model for claim amounts. We wish to find the 
probability of the occurrence of a claim greater than $1300. Since X is 
normal with и = 7 and с = 0.5, we can use Z-tables. The probability of 
a claim less than or equal to 1300 is 

P(Y x 1300) — P(e* « 1300) 
= Р(Х < In 1300) 


= p(z < mimi радуе .6331. 


The probability of a claim greater than 1300 is 
1— P(Y € 1300) = 1 —.6331 = .3669. L1 


Technology Note 


Microsoft? EXCEL has a function LOGNORMDIST() which 
calculates values of the cumulative distribution function for a given log- 
normal. For the preceding example, EXCEL gives the answers 


Р(Ү < 1300) = .6331617 and P(Y > 1300) = .3668383. 


Note the difference from the Z-table answer in the fourth decimal place. 
Recall that EXCEL will give more accurate normal probabilities than the 
Z-table method. (The TI-83 gives the same answer as EXCEL when 
used to calculate the Р(Х < Іт 1300) for the normal X with p = 7 and 
с = 0.5.) 
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8.5.4 The Lognormal Distribution for a Stock Price 


The value of a single stock at some future point in time is a random 
variable. The lognormal distribution gives a reasonable probability model 
for this random variable. This is due to the fact that the exponential 
function is used to model continuous growth. 


Continuous Growth Model 
Value of asset at time t if growth is continuous at rate r 


A(t) = A(0)- e" (8.24) 


Example 8.24 A stock was purchased for A(0) = 100. Its value 
grows at a continuous rate of 10% per year. What is its value in (a) 6 
months; (b) one year? 


Solution 
(a) A(.5) = 100e!%) zz 105.13 
(b  A(1) = 100e!% = 110.52 LI 


In the last example, the stock is known to have grown at a given 
rate of 1096 over a time period in the past. When we look to the future, 
the rate of growth X is a random variable. If we assume that X is 
normally distributed, then the future value Y = 100. eX is a multiple of 
a lognormal random variable. 


Example 8.25 A stock was purchased for A(0) — 100. Its value 
will grow at a continuous rate X which is normal with mean и = .10 and 
standard deviation с = .03. Then the value of the stock in one year is the 
random variable Y = 100e*, where eX is lognormal. L1 


The use of the lognormal distribution for a stock price is discussed 
in more detail by Hull [7]6. 


6 See page 281. 
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8.6 The Pareto Distribution 


8.6.1] Application of the Pareto Distribution 


In Section 8.5 the lognormal distribution was used to model the amounts 
of insurance claims. The Pareto distribution can also be used to model 
certain insurance loss amounts. The next figure shows the graph of a 
Pareto density function for loss amounts measured in hundreds of dollars 
(i.e., a claim of $300 is represented by x = 3). 


Pareto Density Function 


Note that the distribution starts at т = 3. This insurance policy has a 
deductible of $300. The insurance company pays the loss amount minus 
$300. Thus claims for $300 or less are not filed and the only losses of 
interest are those for more than $300. 


8.6.2 The Density Function of the Pareto Random Variable 


The Pareto distribution has a number of different equivalent formula- 
tions. The one we have chosen involves two constants, o and f. 


Pareto Density Function 
Constants a and 3 


fe = (8), a»2, 22820 (8.25 


7 The Pareto density function can be defined for a 0, but the restriction that o > 2 
guarantees the existence of the mean and variance. 
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Example 8.26 The Pareto density in the previous figure has 
a = 2.5 and {2 = 3. The density curve is 


3.5 
f(x) = 2e (3) ; fore d. o 


Note that the value of 9 must be set in advance to define the domain of 
the density function. Once 2 is set, the value of а can vary. The Pareto 
distribution shown here is often referred to as a single parameter Pareto 
distribution with parameter o. There is a different Pareto distribution 
called the two parameter Pareto distribution. We will not cover the two 
parameter distribution in this text, but it is useful to know that the term 
"Pareto distribution" can refer to different things. 


8.6.5 The Cumulative Distribution Function; Evaluating 
Probabilities 


In dealing with the normal and lognormal distributions we had density 
functions which were difficult to integrate in closed form, and numerical 
integration was used for evaluation of F(x). Since the Pareto distribu- 
tion has a density which is a power function, F(x) can be easily found. 
The details are left for the reader in Exercise 8-42. 


Pareto Cumulative Distribution Function 
Parameters o and 3 


Ray d (2), a»2, 22820 (8.26) 


Once F(z) is known, it can be used to find probabilities for a Pareto 
random variable. There is no need for further integration. 


Example 8.27 The Pareto random variable in Example 8.26 had 


a = 2.5 and 8 = 3. The cumulative distribution function is 


т 


F(z) = 1— (2). for x > 3. 


If the random variable X represents a loss amount, find the probability 
that a loss is (a) between 400 and 600; (b) greater than 1000. 
Solution 


(а) P(4< X <6) = Е(6) – F(4) = (3)" = Gyr ~ .3104 
(b) P(X > 10) = S(10) = 1 – F(10) = а ~ 0493 0 
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8.6.4 The Mean and Variance of the Pareto Distribution 


The mean and variance of the Pareto distribution can be obtained by 
straightforward integration of power functions. This is left for the exer- 
cises. 


Pareto Distribution Mean and Variance 
Parameters a and 8 


E(X) = .9P 


«—1 


v(x) = 28; - (a) 


a—2 


Example 8.28 The Pareto random variable in Example 8.26 had 
a = 2.5 and f = 3. The mean and variance are 


2 2 
V(X) = $20) = (5393) = 20. m 


and 


Note that if we look at X as a loss amount in hundreds of dollars, 
Example 8.28 says that the expected loss is $500. However, we have 
interpreted the insurance modeled as insurance for the loss less a 
deductible of $300. The random variable for the amount paid on a single 
claim is X — 3. Thus the expected amount of a single claim is 


Е(Х—3) = E(X) -3 = 2. 


8.6.5 The Failure Rate of a Pareto Random Variable 


In Equation (8.14) we defined the failure (hazard) rate of a random vari- 
able to be 


(b 
X6 = rs. 


The reader may wonder why we did not calculate the failure rates 
of the gamma, normal and lognormal distributions. The answer is that 
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those calculations do not provide a simple answer in closed form. The 
Pareto distribution, however, does have a failure rate that is easy to find. 


vy 30) а 


This failure rate does not make sense if x represents the age of a mach- 
ine part or a human being, since it decreases with age. Unfortunately, 
humans and their cars tend to fail at higher rates as the age x increases. 

Although the Pareto model may not be appropriate for failure time 
applications, it is used to model other phenomena such as claim 
amounts. The decreasing failure rate causes the Pareto density curve to 
give higher probabilities for large values of z than you might expect. For 
example, despite the fact that the density graph for the claim distribution 
in this section appears to be approaching zero when zx = 12, the 
probability P(X > 12) is .031. The section of the density graph to the 
right of z — 12 is called the tail of the distribution. The Pareto distribu- 
tion is referred to as heavy-tailed?. 


8.7 The Weibull Distribution 


8.7.1 Application of the Weibull Distribution 


Researchers who study units that fail or die often like to think in terms of 
the failure rate. They might decide to use an exponential distribution 
model if they believe the failure rate is constant. If they believe that the 
failure rate increases with time or age, then the Weibull distribution 
can provide a useful model. We will show that the failure rate of a 
Weibull distribution is of the form A(z) = ofiz?^^!. When o > 1 and 
В > 0, this failure rate increases with x and older units really do have a 
higher rate of failure. 


8.7.0 The Density Function of the Weibull Distribution 


This density function has two parameters, œ and б. It looks complicated, 
but it 1s easy to integrate and has a simple failure rate. 


8 See [8] Klugman et al., Second Edition, page 48 for a discussion of this. 
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Weibull Density Function 
Parameters o > 0 and B > 0 


f(x) = apr% !e P, forg > 0 
Example 8.29 When o = 2 and д = 2.5, the density function is 


f(z)25z.e?5*, forz > 0. 


It is graphed in the next figure. 


Weibull Density Function 


The reader should note that if o; = 1, the density function becomes the 
exponential density Be~”. Thus the exponential distribution is a special 
case of the Weibull distribution. m 


8.7.3 The Cumwulative Distribution Function 
and Probability Calculations 


The Weibull density function can be integrated by substitution since 
ax?! is the derivative of z^. Thus the cumulative distribution function 
can be found in closed form. (The reader can check the F(x) given 
below without integration by showing that F'(x) = f(z).) 
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Weibull Cumulative Distribution Function 
Parameters œ > 0 and 8 > 0 


F(z) = 1—e7*, forz > 0 


For the density function in Example 8.29, 
F(z) = 1— е725, for x > 0. 


Once we have F(x), we can use it to find probabilities as we did with 
the Pareto distribution. 


Example 8.30 Suppose the Weibull random variable X with 
а = 2 and fj = 2.5 represents the lifetime in years of a machine part. 
Find the probability that (a) the part fails during the first 6 months; (b) 
the part lasts longer than one year. 
Solution 
(a) Convert 6 months to 0.5 years. 
P(X < 5) = Е(.5) = 1 — e255 a 465 


b) P(X > 1) = 8(1) =1-— Е(1) =e 2) ~ 082 oO 
8.7.4 The Mean and Variance of the Weibull Distribution 
The mean and variance of the Weibull distribution are calculated using 
values of the gamma function I(x), which was defined in Equation (8.8) 


of Section 8.2.1. We will not give derivations here. The reader will be 
asked to derive E(.X) using Equation (8.10) in Exercise 8-49. 


Weibull Distribution Mean and Variance 
Parameters х > 0 and B > 0 


E(X) = асы (8.30а) 


a 


V(X) = X |г(1+8) -r(i+d) | (8.30b) 
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The reader may recall that when n is a non-negative integer, then 
T(n) = (n — 1)!. In cases where the above gamma functions are applied 
to non-integral arguments, calculation of the mean and variance may 
require some work. However, the calculations can be done using 
numerical integration on modern calculators. In the following example 
we will be able to avoid this by using the known gamma function value 


Example 8.31 We return to the Weibull random variable X with 
a = 2 and B = 2.5. The mean and variance of X are 


and 
eg [г(1+4) = (1+4)? 


2 
5 Е (х) | ~ 085841. п 


8.7.5 The Failure Rate of a Weibull Random Variable 


The Weibull distribution is of special interest due to its failure rate. 


As previously mentioned, the Weibull failure rate is proportional to a 
positive power of х. Thus the Weibull random variable can be used to 
model phenomena for which the failure rate increases with age. 


Example 8.32 For the Weibull random variable X with a = 2 
and 8 = 2.5, the failure rate is A(x) = 5z. L1 
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Technology Note 


Probability calculations for the Weibull distribution do not require 
sophisticated technology, since F(x) has an exponential form that can be 
easily evaluated. Microsoft® EXCEL does have a WEIBULL() function 
to calculate values of f(z) and F(x). The reader needs to use this with 
some care, since a different (equivalent) form of the Weibull is used 
there, and parameters must be converted from our form to EXCEL form. 

Technology can be used to evaluate the mean and variance when the 
gamma function has arguments that are not integers. We can either evaluate 
the defining integral for the gamma function to complete the calculation of 
Equations (8.30a) and (8.30b), or directly evaluate the integrals which define 
E(X) and E(X?). The latter approach was used by the authors to check the 
values found in Example 8.31 using the TI-92 calculator. 


8.8 The Beta Distribution 


8.8. Applications of the Beta Distribution 


The beta distribution is defined on the interval [0, 1]. Thus the beta distri- 
bution can be used to model random variables whose outcomes are 
percents ranging from 0% to 100% and written in decimal form. It can be 
applied to study the percent of defective units in a manufacturing process, 
the percent of errors made in data entry, the percent of clients satisfied 
with their service, and similar variables. Herzog [4] used properties of the 
beta distribution to study errors in the recording of FHA mortgages.? 


8.8.2 The Density Function of the Beta Distribution 


The beta distribution has two parameters, o and 3. The gamma function 
T(x) is used in this density function. 


Beta Density Function 
Parameters œ > 0 and 8 > 0 


fie rp a —z-., foüczcl (8.32) 


9 See Chapter 11. 
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The density function f(x) may be difficult to integrate if o or 3 is not an 
integer, but it will be a polynomial for integral values of o and 6. 


Example 8.33 A management firm handles investment accounts 
for a large number of clients. The percent of clients who telephone the 
firm for information or services in a given month is a beta random 
variable with a = 4 and 8 = 3. The density function is given by 


у= ea 'd — z^ = 602(1— xy 
= 60(z? — 22+ + z), for0 < z < 1. 


The graph is shown in the next figure. 


Beta Density Function 


L1 


8.8.3 The Cumulative Distribution Function and Probability 
Calculations 


When o — | and 8 — 1 are non-negative integers, the cumulative distri- 
bution function can be found by integrating a polynomial. 


Example 8.34 For the random variable X in Example 8.33, F(x) 
is found by integration. For 0 < x < 1, 


F(z) = f f(wdu = if 60(u? — 2u* + и?)аи = (3 - 2% + ©). 
0 0 


The probability that the percent of clients phoning for service in a month 
is less than 40% is 
F(.40) = .17920. 


The probability that the percent of clients phoning for service in a month 
is greater than 60% is 
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1 — F(.60) = 1 — .54432 = .45568. 


Calculations аге more difficult when a and f are not integers, but 
technology will help us obtain the desired results. Г] 


8.8.4 A Useful Identity 


The area between the density function graph and the x-axis must be 1, so 
the integral of the density function from 0 to 1 must be 1. 


i cf TOO а farsz 
] fen = | I(o)-I(8)* (1 — ж)? ldx = 1 


We have stated this result without proof. А proof would be required to 
show that f(z) is truly a density function. Once we accept the result, we 
can derive a useful identity. 


Example 8.35 Let a = 4 and B = 3. Then 
| 
3 Pio Эд! DE 
8.8.5 The Mean and Variance of a Beta Random Variable 
The identity in Equation (8.33) can be used to find the mean and 


variance of a beta random variable X. The reader is asked to find E(X) 
in Exercise 8-55. The mean and variance are given below. 


Beta Distribution Mean and Variance 


Parameters œ > 0 and д > 0 


BO) = gt 


V(X) = 


[01 
(a+ B)*(a +8 + 1) 
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Example 8.36 The mean and variance of the percent of clients 
calling in for service in the preceding examples are 


E(X) = no = 4 ~ 5714 


and 


4.3 
V(X) = —— > = .0306. 
(0 (4+ 3)(4+3 + 1) coon 3 


Technology Note 


When either o or 3 is not an integer, technology can be used to 
find probabilities for a beta random variable. Microsoft? EXCEL has a 
function BETADIST() which gives values of F(x) for the beta 
distribution. Alternatively, the TI-83 or TI-89 can be used to integrate 
the density function. For example, when a = 4 and 8 = 1.5, Microsoft 
EXCEL gives the value F(.40) — .05189. The reader will be asked to 
show in Exercise 8-50 that the density function for a = 4 and f = 1.5 is 


f(z) = atl — r. 


The TI-83 gives the numerical result 


40 


f(x) dx = 05189. 
0 


8.9 Fitting Theoretical Distributions to Real Problems 


The reader may be wondering how a researcher first decides that a 
particular distribution fits a specific applied problem. Why are claim 
amounts modeled by Pareto or lognormal distributions? Why do heights 
follow normal distributions? This kind of model selection is difficult, 
and it may involve many methods which are not developed in this text. 
However, there is one simple approach which is commonly used. If a 
researcher is familiar with the shapes of various distributions, he or she 
can collect real data on claims and try to match the shapes of the real 
data histograms with the patterns of known distributions. There are 
statistical methods for testing goodness of fit which the researcher can 
then use to see if the chosen theoretical distribution fits the data fairly 
well. 
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The choice of distribution to apply to a problem is really the sub- 
ject of another text. In a probability text, we discuss how to use the 
distribution that applies to a particular problem, not how to find the 
distribution. The distribution appears somewhat like a rabbit pulled out 
of a hat. The reader should be aware that a good deal of work may have 
gone into the selection of the particular rabbit that suddenly appeared. 


8.10 Exercises 


8.1 The Uniform Distribution 
8-1. Derive Equation (8.5b). 


8-2. If Т is the random variable in Example 8.3 whose distribution is 
uniform on (0, 100], find Е(Т) and V(T). 


8-3. In a hospital the time of birth of a baby within an hour interval 
(e.g. between 5:00 and 6:00 in the morning) is uniformly 
distributed over that hour. What is the probability that a baby is 
born between 5:15 and 5:25, given that it was born between 5:00 
and 6:00? 


8-4. Опа large construction site the lengths of pieces of lumber are 
rounded off to the nearest centimeter. Let X be the rounding 
error random variable (the actual length of a piece of lumber 
minus the rounded-off value). Suppose that X is uniformly 
distributed over [—.50,.50]. Find (a) P(—.10 € X < .20); 
(b) V(X). 


8-5. А professor gives a test to a large class. The time limit for the 
test is 50 minutes, and the first student to finish is done in 35 
minutes. The professor assumes that the random variable T' for 
the time it takes a student to finish the test is uniformly 
distributed over [35, 50]. 

(a) Find E(T) and V(T). 
(b) At what time T will 60 percent of the students be fin- 
ished? 
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8-6. 


8-7. 


8-8. 
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8-9. 


8-10. 
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Let T' be a random variable whose distribution is uniform on 
[a,b] and a € c € d < b. Suppose you are given that the value 
of T' falls in the interval (c, d]. Let Y be the conditional random 
variable for those values of Т that are in [c,d]. Show that the 
distribution of Y 1s uniform over [c, d]. 


Suppose you consider the subset of the population in Example 

8.3 who survive to age 40. If T' is the random variable for the 

age at time of death of these survivors, T' has a uniform distribu- 

tion over [40, 100]. 

(a) Find E(T) and V(T). 

(b) What is P(T' > 57) for this group? (Compare this with the 
result in Example 8.3.) 


For the population in Example 8.3 where the time until death 

random variable T' is uniform over [0, 100], consider a couple 

whose ages are 45 and 50. Assume that their deaths are indepen- 

dent events. 

(a) What is the probability that they both live at least 20 more 
years? 

(b) What is the probability that both die in the next 20 years? 


The Exponential Distribution 


Tests on a certain machine part have determined that the mean 

time until failure of this part is 500 hours. Assume that the time 

T until failure of this part is exponentially distributed. 

(a What is the probability that one of these parts will fail 
within 300 hours? 

(b) What is the probability that one of these parts will still be 
working after 900 hours? 


If T' has an exponential distribution with parameter А, what is 
the median of T? 


For a certain population the time until death random variable T' 

has an exponential distribution with mean 60 years. 

(a) What is the probability that a member of this population 
will die by age 50? 

(b) What is the probability that a member of this population 
will live to be 100? 
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8-12. 


8-13. 


If Т is uniformly distributed over [a,b], what is its failure rate? 


Researchers at a medical facility have discovered a virus whose 

mean incubation period (time from being infected until symp- 

toms appear) is 38 days. Assume the incubation period has an 

exponential distribution 

(a) What is the probability that a patient who has just been 
infected will show symptoms in 25 days? 

(b) What is the probability that a patient who has just been 
infected will not show symptoms for at least 30 days? 


If Т has an exponential distribution, show that P[T' < E(T)] 15 
F[E(T) = 1 —e7! ғ .632. 


A city engineer has studied the frequency of accidents at two 

busy intersections. He has determined that the time T in months 

between accidents at each intersection has an exponential distri- 

bution. The parameters for these two distributions are 2 and 2.5. 

Assume that the occurrence of accidents at these intersections 1s 

independent. 

(a) What is the probability that there are no accidents at either 
intersection in the next month? 

(b) What is the probability that there will be no accidents for 
at least one of these intersections in the next month? 


If T has an exponential distribution with parameter .15, what are 
the 25" and 75" percentiles for T? 


Using Equation (8.8) and integration by parts, derive the identity 
T(n) = (п – 1): Г(п – 1). 


Let Т be a random variable whose distribution is exponential 
with parameter A. Show that P(T > a+ b|T > a) = P(T > b). 


Consider the population in Exercise 8-11. 

(a) What is the probability that a member of this population 
who lives to age 40 will die by age 50? 

(b) What is the probability that a person who lives to age 40 
will then live to age 100? 
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The Gamma Distribution 


Using Equation (8.10) and the result in Exercise 8.17, show that 
the mean of the gamma distribution with parameters o and f is 


aD. 


Use Equation (8.10) and Exercise 8.17 to show if .X has a gamma 
distribution with parameters a and 8, then E(X?) = o(a + 1)/82 
and hence V(X) = a/ff?. 


At a dangerous intersection accidents occur at a rate of 2.5 per 
month, and the time between accidents is exponentially 
distributed. Let Т' be the random variable for the waiting time 
from the beginning of observation until the third accident. Find 
EXT) and V (T). 


Suppose a company hires new people at a rate of 8 per year and 
the time between new hires is exponentially distributed. What 
are the mean and variance of the time until the company hires its 
12^ new employee? 


A gamma distribution has a mean of 18 and a variance of 27. 
What are о and f for this distribution? 


A gamma distribution has parameters o = 2 and 8 = 3. Find 
(a) F(z); (b) PO < X € 3); (c) PA € X € 2). 


The length of stay X in a hospital for a certain disease has a 
gamma distribution with parameters a = 2 and 8 = 1/3. The 


cost of treatment in the hospital is C = 500X + 50X?. What is 
the expected cost of a hospital treatment for this disease? 


The Normal Distribution 


Using the z-table in Appendix A, find the following probabilities: 


(а) P(-1.15<Z<1.56) (Ы) P(0.15 < Z < 2.13) 
(е) P(Z| < 1.0) (d Р(|212 1.65). 
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8-28. 


8-29. 


8-33. 


Using the z-tables in Appendix A, find the value of z that satis- 
fies the following probabilities: 


(а) P(Z < z) = 8238 (b Р(Х < z) = .0287 
(c) P(Z>2z)=.9115 (d) P(Z > ғ) = .1660 
(е) P(Z| > 2) = .10 (Ð P(Z| < 2) = 95 


Let z be the standard normal random variable. If z > 0 and 
F(z) = о, what are Fz(—z) and P(—z € Z < zy? 


If X is a normal random variable with a mean of 17.1 and a 
standard deviation of 3.2, what is P(14 € X < 25)? 


An insurance company has 5000 policies and assumes these 
policies are all independent. Each policy is governed by the 
same distribution with a mean of $495 and a variance of 
$30,000. What is the probability that the total claims for the year 
will be less than $2,500,000? 


A company manufactures engines. Specifications require that the 
length of a certain rod in this engine be between 7.48 cm. and 
7.52 cm. The lengths of the rods produced by their supplier have 
a normal distribution with a mean of 7.505 cm. and a standard 
deviation of .01 cm. 


(a) What is the probability that one of these rods meets these 
specifications? 

(b) Ifa worker selects 4 of these rods at random, what is the 
probability that at least 3 of them meet these specifica- 
tions? 


The lifetimes of light bulbs produced by a company are normally 
distributed with mean 1500 hours and standard deviation 125 
hours. 


(a) What is the probability that a bulb will last at least 1400 
hours? 

(b) If 3 new bulbs are installed at the same time, what is the 
probability that they will all still be burning after 1400 
hours? 
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8-34. 


8-40. 


8-41. 
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If a number is selected at random from the interval [0,1], its 
value has a uniform distribution over that interval. Let S be the 
random variable for the sum of 50 numbers selected at random 
from [0, 1]. What is P(24 < S < 27)? 


Let X have a normal distribution with mean 25 and unknown 
standard deviation. If Р(Х < 29.9) = .9192, what is с? 


The Lognormal Distribution 


If Y = eX, where X is a normal random variable with и = 5 
and с = .40, what are E(Y) and V(Y)? 


If Y is lognormal and X, the normally distributed exponent, has 
parameters и = 5.2 апас = .80, what is P(100 < Y < 500)? 


The claim severity random variable for an insurance company is 
lognormal, and the normally distributed exponent has mean 6.8 
and standard deviation 0.6. What is the probability that a claim 
is greater than $1750? 


If Y is a lognormal random variable, and the normally distribu- 
ted exponent has parameters yz and с, what is the median of Y? 


For the stock in Example 8.24, whose value in one year is 
Y = 100e* where X is normal with parameters и = .10 and 
a = .03, what is the probability that the value of the stock in one 
year will be (a) greater than 112.50; (b) less than 107.50. 


If Y = e* is a lognormal random variable with E(Y) = 2,500 
and V(Y) = 1,000,000, what are the parameters и and с for X? 
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8.6 


8-42. 


8-43. 


8-44. 


8.7 


8-45. 


8-46. 


8-47. 


8-48. 


8-49. 


The Pareto Distribution 


Let X be the Pareto random variable with parameters o and f, 
o »2andz > B > 0. 


(a) Verify that F(x) = 1 — (8/z)?. 

(b) Verify that E(X) = af/(a — 1). 

(c) Verify that E(X?) = of?/(a — 2), and use this result to 
obtain V (X). 


For the Pareto random variable with a = 3.5 and 8 = 4, find 
(a) E(X); (b) V(X); (c) the median of X; (d) P(6 < X < 12). 


A comprehensive insurance policy on commercial trucks has a 
deductible of $500. The random variable for the loss amount 
(before deductible) on claims filed has a Pareto distribution with 
a failure rate of 3.5/2 (x measured in hundreds of dollars). Find 
(a) the mean loss amount; (b) the expected value of the amount 
paid on a single claim; and (c) the variance of the amount of a 
single loss. 


The Weibull Distribution 


It can be shown (although beyond the scope of this text) that 
Г(1/2) = v!?. Using this and the result of Exercise 8-17, find (a) 
Г(3/2); (b) (5/2); (c) Г(7/2). (Can you see a pattern?) 


Let X be the Weibull random variable with a = 3 and 8 = 3.5. 
Find (a) P(X < 0.4); (Ы) P(X > 0.8). 


What is the failure rate for the random variable in Exercise 
8-46? 


For the Weibull random variable X with o = 2 and [8 = 3.5, 
find (a) E(X); (D V(X); (c) Р(.25 € X < 75). 


Using Equation (8.10), verify that the mean of a Weibull distri- 
bution is Г(1 + 1/a)/B"^, (Hint: Transform the integral using 
the substitution u = z?.) 
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8-50. 


8-51. 


8-53. 
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The Beta Distribution 


Find the density function for the beta distribution with a = 4 and 
В =1.5. (Hint: Use the results of Exercise 8.17.) 


Find the value of k so that f(x) = kx*(1-x for 0<х<1 isa 
beta density function. 


A meter measuring the volume of a liquid put into a bottle has an 
accuracy of + 1 cm’. The absolute value of the error has a beta 
distribution with @=3 and #=2. What are the mean and 
variance for this error? 


In Exercise 8-52, what is the probability that the error is no more 
than 0.5ст?? 


A company markets a new product and surveys customers on 
their satisfaction with this product. The fraction of customers 
who are dissatisfied has a beta distribution with о = 2 and 
В —4. What is the probability that no more than 30 percent of 
the customers are dissatisfied? 


Using Equation (8.33), verify that the mean of the beta distribu- 
tion is a (0+ В). 


Sample Actuarial Examination Problems 


The time to failure of a component in an electronic device has an 
exponential distribution with a median of four hours. 


Calculate the probability that the component will work without 
failing for at least five hours. 


The waiting time for the first claim from a good driver and the 
waiting time for the first claim from a bad driver are independent 
and follow exponential distributions with 6 years and 3 years, re- 
spectively. 


What is the probability that the first claim from a good driver 
will be filed within 3 years and the first claim from a bad driver 
will be filed within 2 years? 
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8-58. 


8-60. 


8-61. 


The lifetime of a printer costing 200 is exponentially distributed 
with mean 2 years. The manufacturer agrees to pay a full refund 
to a buyer if the printer fails during the first year following its 
purchase, and a one-half refund if it fails during the second year. 


If the manufacturer sells 100 printers, how much should it expect 
to pay in refunds? 


The number of days that elapse between the beginning of a 
calendar year and the moment a high-risk driver is involved in an 
accident is exponentially distributed. An insurance company 
expects that 30% of high-risk drivers will be involved in an 
accident during the first 50 days of a calendar year. 


What portion of high-risk drivers are expected to be involved in 
an accident during the first 80 days of a calendar year? 


An insurance policy reimburses dental expense, X, up to a 
maximum benefit of 250. The probability density function for X 
is: 


g 0904x for x>0 


0 otherwise 


С 


w] 


where c is a constant. 
Calculate the median benefit for this policy. 


You are given the following information about N, the annual 
number of claims for a randomly selected insured: 


P(N-0-l  PN=)=}  PW»D-i 


Let S denote the total annual claim amount for an insured. When 
М =1, S is exponentially distributed with mean 5. When N » 1, 
S is exponentially distributed with mean 8. 


Determine Р(4 < 5 < 8). 
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8-63. 
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An insurance company issues 1250 vision care insurance 
policies. The number of claims filed by a policyholder under a 
vision care insurance policy during one year is a Poisson random 
variable with mean 2. Assume the numbers of claims filed by 
distinct policyholders are independent of one another. 


What is the approximate probability that there is a total of 
between 2450 and 2600 claims during a one-year period? 


The total claim amount for a health insurance policy follows a 
distribution with density function 


zl “7900 > 
f(x) 1000* for x20. 
The premium for the policy is set at 100 over the expected total 
claim amount. 


If 100 policies are sold, what is the approximate probability that 
the insurance company will have claims exceeding the premiums 
collected? 


A city has just added 100 new female recruits to its police force. 
The city will provide a pension to each new hire who remains 
with the force until retirement. In addition, if the new hire is 
married at the time of her retirement, a second pension will be 
provided for her husband. A consulting actuary makes the 
following assumptions: 


(i) Each new recruit has a 0.4 probability of remaining with the 
police force until retirement. 


(ii) Given that a new recruit reaches retirement with the police 
force, the probability that she is not married at the time of 
retirement is 0.25. 


(iii) The number of pensions that the city will provide on behalf 
of each new hire is independent of the number of pensions it 
will provide on behalf of any other new hire. 


Determine the probability that the city will provide at most 90 
pensions to the 100 new hires and their husbands. 
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8-65. 


8-66. 


In an analysis of healthcare data, ages have been rounded to the 
nearest multiple of 5 years. The difference between the true age 
and the rounded age is assumed to be uniformly distributed on 
the interval from —2.5 years to 2.5 years. The healthcare data are 
based on a random sample of 48 people. 


What is the approximate probability that the mean of the rounded 
ages is within 0.25 years of the mean of the true ages? 


A charity receives 2025 contributions. Contributions are assumed 
to be independent and identically distributed with mean 3125 and 
standard deviation 250. 


Calculate the approximate 90" percentile for the distribution of 
the total contributions received. 


Chapter 9 
Applications for Continuous 
Random Variables 


9. Expected Value of a Function of a Random Variable 
9.1.1 Calculating E[g(X)] 
In Section 7.3.2 we gave the integral which is used for the expected 


value of g(X), where X is a continuous random variable with density 
function f(x). 


Elg(X)) = if giz)- Fei 


In this section we will give a number of applications which require cal- 
culations of this type. 


9.1.2 Expected Value of a Loss or Claim 


Example 9.1 The amount of a single loss X for an insurance 
policy is exponential, with density function 


f(x) 0020-905. 
for x > 0. The expected value of a single loss is 


E(X) = туу = 500. g 
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Example 9.2 (Insurance with a deductible) Suppose the insur- 
ance in Example 9.1 has a deductible of $100 for each loss. Find the 
expected value of a single claim. 

Solution The amount paid for a loss x is given by the function 
g(x) below. 

_ Jo 0 « x < 100 
qu | — 100) 100<х 


The expected amount of а single claim is 
EIX = | g(x) (0026799) d 
0 
ос 
= / (x — 100)(.002e " 99?7) dz 
100 
= Le- 902: ( 14400) p = 500e-29 ~ 409.37. m 


Example 9.3 (Insurance with a deductible and a cap) Suppose the 
insurance in Example 9.1 has a deductible of $100 per claim and a 
restriction that the largest amount paid on any claim will be $700. 
(Payments are capped at $700, so that any loss of $800 or larger will 
receive a payment of $800 — $100 = $700.) Find the expected value of 
a single claim for this insurance. 

Solution The amount paid for a loss z is given by the function 
h(x) below. 


0 0 « x < 100 
h(x) = 4 (z—100) 100 < x < 800 
700 z > 800 


The expected claim amount E[h(.X)] is 


E[h(X)] = f ма) - (.002е7 "dx 


800 


oo 
= | (w—100)(.002e~™*)dx + n 700(.002e— 922 da 
100 800 


oo 


800 
= —e7002(1+400)| + 700(- e 9?7) 


800 
= 167.09 + 141.33 = 308.42. L 
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Calculations of the expected value of the amount paid for in- 
surance with a deductible or for an insurance with a cap are very impor- 
tant in actuarial mathematics. Because of this, there is a special notation 
for each of them. 

The expected value of the amount paid on an insurance with loss 
random variable X and deductible x is written as E [X = 232]; In 
Example 9.2 we found E[(X — 100)... 

The expected value of the amount paid on the insurance with 
loss random variable X and cap т is written as E [X ^ z)] А 

In the advanced actuarial text Loss Models: From Data to 
Decisions! there are formula tables that give simple algebraic formulas 
for these amount paid expected values for many random variables 
(including the exponential), thus enabling you to skip the integrations 
and proceed rapidly to the answer. It is not necessary to master this 
advanced material at this point, but it is good to know that a very useful 
simplification is available in many cases. 


9.1.3 Expected Utility 


In Section 6.1.3 we looked at economic decisions based on expected 
utility. The next example illustrates the use of expected utility analysis 
for continuous random variables. 


Example 9.4 A person has the utility function u(w) = Jw, 
which measures the utility attached to a given level of wealth w. She can 
choose between two methods of managing her wealth. Under each 
method, the wealth W is a random variable in units of 1000. 


Method 1: W, is uniformly distributed on [9,11]. Then the expected 
value is E(W,) = 10 and the density function is 


fi(w) = 1, for9 < ш <1. 


Method 2: И is uniformly distributed on [5,15]. Then the expected 
value is E(W5) = 10 and the density function is 


(ш) = b for5 < w < 15. 


! See [8] 


258 Chapter 9 


The two methods have identical expected values, but the investor bases 
decisions on expected utility. The expected utilities under the two 
methods are as follows: 


all 
Method 1:  E[u(Wj)] = Jw - 4dw 


1.5 111 
w| де 3.16 
9 


15 
Method 2: E[u(W;]- | үю: pdw 
5 
1.5 115 
Ww PM 
wel. z 3.13 


The person here will choose Method 1 because it has higher expected 
utility. Economists would say that a person with a square root utility 
function is risk averse and will choose W, because W^ is riskier. [1 


9.2 Moment Generating Functions for Continuous 
Random Variables 


9.2.1 А Review 


The moment generating function and its properties were presented in 
Section 6.2. The moment generating function of a random variable X 
was defined by 

Mx(t) = E(e*). 


The moment generating function has a number of useful properties. 


(1) The derivatives of Mx(t) can be used to find the moments 
of the random variable X. 


MX(0) = E(X), М"(0) = E(X?), ..., MPO) = E(X”) 


(2) The moment generating function of aX + b can be found 
easily if the moment generating function of X is known. 


Max+o(t) = e - Mx(at) 
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(3) Ifa random variable X has the moment generating function 
of a known distribution, then X has that distribution. 


All of the above properties were developed for discrete random 
variables in Chapter 6. All of them also hold for continuous random 
variables. The only difference for continuous random variables is that 
the expectation in the definition is now calculated using an integral. 


Moment Generating Function 
X continuous with density function f(z) 


oc 


Mx(t) = E(e*) = | e" . f(x) d 


=00 


Some continuous random variables have useful moment generating 
functions which can be written in closed form and easily applied, and 
others do not. In the following sections, we will give the moment 
generating functions for the gamma and normal random variables 
because these can be found and will have useful applications for us. The 
moment generating function of the uniform distribution will be left as an 
exercise. The beta and lognormal distributions do not have useful mo- 
ment generating functions, and the Pareto moment generating function 
does not exist. 


9.2.2 The Gamma Moment Generating Function 


The gamma distribution provides a nice example of a distribution which 
looks complex, but has a simple moment generating function which can 
be derived in a few lines. To derive it, we will need to use the integral 
given in Equation (8.10). 


oo 
/ x'e “dr = cuo fora > 0andn» -1 
0 a 


This identity is valid if n is not an integer. If n is an integer, then 
Г(п+1) = n!. Using the identity we can find M x(t) for a gamma 
random variable X with parameters a and 8. We will need to assume 
that we are only working with values of t for t < f, so that 5 — t > 0. 
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Mx(t) 


Le - f(x) dz 
0 
= | eat te da 
— В“ f «—1 -(8-0x d 
| T € x 
= & (TO ) x E. 
Г(о) \ (6 — t)* B-t 


Moment Generating Function for the Gamma Distribution 
Parameters о and 5 


Mx) = ( 5 Js fort < В (9.2) 


We can now use M x (t) to find the mean and variance of a gamma 
distribution. It is convenient to rewrite My(t) as a negative power 
function. 


Mx(t) = PB- ty 
MX) = ap — e+ 
MYO) = alat IBB – 0762) 
MX(0) = app —0)*» = $ = ЕХ) 
М0) = а(а+1)8°(8 - cen = MTD = ку?) 
У(Х) = EÐ- [BOOP = Ж 


We have now derived ће mean and variance of the gamma distribution. 
Since the exponential distribution is the special case of the gamma with 
a= 1, we have also found the moment generating function for the 
exponential distribution. 
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Moment Generating Function for the Exponential Distribution 
Parameter 8 


Mx(t) = gp fort « f (9.3) 


9.2.3 The Normal Moment Generating Function 


We will not derive this function, but will use it to derive an important 
property of the normal distribution. 


Moment Generating Function for the Normal Distribution 
Parameters и and с 


Mx(t) = et = (9.4) 


We can now use M x(t) to find E(X). 


Mi (t) = et" (и + 020) 
Му(0) = н 


The reader is asked in Exercise 9-11 to find E(X?) and V(X) using the 
moment generating function. 

Suppose X has a normal distribution with mean p and standard 
deviation c, and we need to work with the transformed random variable 
Y = aX + b. Property (2) of the moment generating function enables us 
to find My(t). 


(at)? 
Max+o(t) = е . M x(at) = eld . ertt =F 


2,2 
ES elauto)t+ © t 


The last expression above is the moment generating function of a normal 
distribution with mean (ар +b) and standard deviation |a|o. Thus 
Y = aX +b must follow that distribution. We have derived the follow- 
ing property of normal random variables. This property was stated 
without proof in Section 8.4.3. 
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Linear Transformation of Normal Random Variables 


Let X be a normal random variable with mean и and standard 
deviation c. Then Y = aX + Б is a normal random variable with 
mean (ap + b) and standard deviation |а|о. 


The moment generating function will prove very useful in Chapter 
11 when we look at sums of random variables. 


9.3 The Distribution of Y = g( X) 
9.3.1 An Example 


We have already seen simple methods for finding E[g(.X)] and V[g(x)], 
but the mean and variance alone are not sufficient to enable us to calcu- 
late probabilities for Y = g(X). Calculation of probabilities requires 
knowledge of the distribution of Y . The reasoning necessary to find this 
distribution has already been used. It is reviewed in the next example. 


Example 9.5 The monthly maintenance cost X for a machine is 
an exponential random variable with parameter 8 = .01. Next year costs 
will be subject to 5% inflation. Thus next year's monthly cost is 
Y = 1.05X. Find (a) E(Y); (b) P(Y < 100); (c) the cumulative distri- 
bution function Fy(y). 

Solution 

(a) The given information implies that 

Ll. 
Е(Х) = a 100. 
Then E(Y) = 1.05E(X) = 105. We did not need to know 
the distribution of Y for this calculation. 


(b) Ме know that the cumulative distribution function for X is 
Fx(x) 21—- e 75, 2 > 0. 


Some simple algebra allows us to find the desired probabi- 
lity for Y using the known cumulative distribution for X. 
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P(Y < 100) = P(1.05X < 100) 


(c) Ме have just found P(Y < 100) = Fy(100). The same 
logic can be used to find P(Y < y) = Fy(y) for any value 
of y > 0. 


Fy(y) = PY € y) = P(1.05X < y) 
- p(x < rhs) 
= Fy (rhs) = 1- et) 


Note that the set of all possible outcomes for X is the interval [0, оо). 
The set of all possible outcomes for Y = 1.05.X is the same interval. O 


9.3.2 Using Fx(z)to Find Fy(y) for Y = g(X) 
The method of Example 9.5 can be used in a wide range of problems. 
Example 9.6 Let X be exponential with 2 = 3. Find the cumula- 


tive distribution function for Y = y X. 
Solution We know that Fx(x) = 1 — ег. 


Fy) = PY < у) = PV X € y) 
= P(X&y) 
= Fy(y?) =1-e3" 
The sample space for Y is the interval [0, oo). Thus Рү (у) is defined for 


y > 0. Note that Fy(y) is the cumulative distribution function for a 
Weibull random variable with o; = 2 and 8 = 3. О 
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Example 9.7 Let X be exponential with 8 = 3. Find the cumula- 
tive distribution function for Y — 1 — X. 

Solution We know that Sx(r) = e^?7. 

Fy(y)-PYsy-Püu-Xctzy 
= Sy(1- у) = e0») 

The set of all possible outcomes for X is the interval [0, oo). The set of 
all possible outcomes for Y = 1 — X is the interval (—oo, 1]. This 
example shows that the sample space for Y may differ from the sample 
space for X. o 

Finding Fy(y) gives us all the information that is needed to 
calculate probabilities for Y. Thus there is no real need to find the 


density function fy(y). If the density function is required, it can be 
found by differentiating the cumulative distribution function. 


fro) = Fv) 


Example 9.8 Let X be exponential with 8 = 3. The density 
function for Y — 1 — X is 


fr) = (e MW) = 3670-9), for y < 1. D 


In each of the previous examples the function g(x) was strictly increas- 
ing or strictly decreasing on the sample space interval (0, co). Careful 
attention is required if g(z) is not restricted in this manner. 


Example 9.9 Let X have a uniform distribution on the interval 
[—2, 2]. Then for -2 < a € b € 2, 


Р(а < X < b) = 254. 
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Suppose that Y = X?. The sample space for Y is the interval [0, 4]. For 
y in this interval, 


Fy(y) = P(Y € y) = Р(Х? < y) 
= P(IX| € J/v) 
= P(-/y € X x Vy) 
-xYs-Cym Eu o 


9.3.3 Finding the Density Function for Y = g(X) 
When g(x) Has an Inverse Function 


Examples 9.5 through 9.7 were much simpler than Example 9.9. We will 
see that this is due to the fact that the function g(x) was either strictly 
increasing or strictly decreasing on the sample space interval for X in 
Examples 9.5 through 9.7. For a strictly increasing or decreasing 
function g(x), we can find an inverse function h(y) defined on the 
sample space interval for У. The reader should recall that if h(y) is the 
inverse function of g(x), then 


h[g(z)] = = 
and 
g[h(y)] = v. 


The inverse functions for Examples 9.5 through 9.7 are given in the 
following examples. 


Example 9.10 In Example 9.5, g(x) = 1.05z, for x > 0. Then 
h(y) = y/1.05, for y > 0. L1 


Example 9.11 In Example 9.6, g(x) = yz, for z > 0. Then 
hly) = y^, for y > 0. О 


Example 9.12 In Example 9.7, g(x) = 1 — =, for x > 0. Then 
Му) = 1 —– у, for y < 1. О 
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Example 9.9 was more complicated because the function 
g(x) = x*, for —2 < x < 2, did not have an inverse function. We can 
see why things are simpler when inverse functions are available if we 
look at two general cases and repeat the reasoning of our previous 
examples. 


Case 1: g(x) is strictly increasing on the sample space for X. 
Let h(y) be the inverse function of g(x). The function h(y) will also be 
strictly increasing. In this case, we can find Fy(y) as follows. 


Fy(y) = PY € y) = PX) € y) 
= P[h(gCX)) € h()] 
= P(X € h(y)) 
— Fx(h(y)) 


We can now find the density function by differentiating. 
МО) = 4j Fy(y) = db Fx(h(y)) = = Fy(h(y))- h'(y) = fy(QQ)) - k'y) 


Case 2: g(x) is strictly decreasing on the sample space for X. 
Let h(y) be the inverse function of g(x). The function h(y) will also be 
strictly decreasing. In this case, we can find Fy (y) as follows. 


Fy(y) = PY € y) = Р(9(Х) € y) 
= P[h(g(X)) > h()] 
= Р(Х > h(y)) 
= Sx(h(y)) 


We can now find the density function by differentiating. 
АМЧ) = d; Fy(y) — fy Sx) 
E £a — Fy(h(y)) 


= —Fx(h(y)) : А (у) = — fx (h()) : hQ) 
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Since h(y) is decreasing, its derivative is negative. Thus the final expres- 
sion in the preceding derivation is positive. 


fx (hy) CRUD = fx (R@))- 187 (Q)] 


The final expression above also equals fy(y) in Case 1, since A(y) is 
positive in Case 1. We have derived a general expression for fy(y) 
which holds in either case. 


Density Function for Y = g(X) 
Let g(x) be strictly increasing or strictly decreasing on the domain 


consisting of the sample space. Then 


fry) = fx(h(y))- |h'(y)]. (9.5а) 


Example 9.13 In Example 9.6, g(x) = yz, for x > 0 and 
h(y) = y?, for y > 0. The random variable X was exponential with 
В = 3 and density function fx(r) = 3e7**. If Y = y X = g(X), then 


fy) = fk’): 2y| = 3e? - 2y, fory > 0. o 
Example 9.14 In Example 9.7, g(x) = | — х, for x > 0 and 
Һу) = 1 — y, for y < 1. The random variable X was exponential with 
В = 3 and density function fx(z) = 3e77. If Y = 1 — X = g(X), then 
fv(y) = fx — y): |-1| = 3e 9079, for y < 1. o 
Some texts use a slightly different notation for this inverse func- 
tion formula. Since the inverse function gives x as a function of y, we 
can write т = h(y). Then the derivative of h(y) is written as 
Hay d£ 
h'(y) = dy’ 


Using this notation, our rule becomes the following: 


Density function for Y = g( X) 
Let g(x) be strictly increasing or strictly decreasing. Then 


fry) = Аһу): |. (9.5b) 
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9.4 Simulation of Continuous Distributions 
9.4.1 The Inverse Cumulative Distribution Function Method 


The inverse cumulative distribution method (also known as the 
inverse transformation method) is the simplest of the many methods 
available for simulation of continuous random variables. If X is a 
continuous random variable with cumulative distribution function F(x), 
a randomly generated value of X can be obtained using the following 
steps: 


(1) Find the inverse function F^ ! (x) for F(z). 
(2) Generate a random number u from [0, 1). 
(3) The value z = F~'(u) is a randomly generated value of X. 


This procedure requires that we find the inverse function Е (2), and 
this may be difficult to do. However the inverse method works simply 
when the inverse is easy to compute. This is illustrated in the next 
example. 

Example 9.15 Let X have the straight line density function 


5 0<2<2 
0 otherwise ` 


Ја) = { 


The graph of this straight-line density function is shown in the next 
figure. 


The cumulative distribution function F(z) is given by 
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F(z) = r«0 


2 
"T 0<т<2 
0 
1 xz>2 
F(z) is strictly increasing on the interval (0, 2]. The inverse function is 
F(u) = 2,/u, for0 <u x 1. 


To generate values of X, we generate random numbers u from [0, 1) and 
calculate х = F~'(u). The next table shows the result of generating 5 
random numbers и and transforming them to values of X, x = F^ (и). 


.15529095 | 0.7881395 
.32379337 | 1.1380569 


.1860507 | 0.8626719 
.41523288 | 1.2887713 
.21343523 | 0.923981 


To illustrate how well this simulation method works, we generated 
1000 values of X. The next figure gives a bar graph showing the percent 
of simulated values in subintervals of [0, 2]. The bar graph displays the 
triangular shape of the density function. 


Simulation Results 


cll 
1.2 


0.0 02 0.4 0.6 1.0 


1.6 1.8 2.0 


x 
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The results on the previous page indicate that the method works fairly 
well, but does not show why. A look at the graph of F(x) might help 
give an intuitive understanding of the method. 


The inverse function takes us from a value selected from [0, 1) (the range 
of F) back to a value of x in the domain of F. As we pick values at 
random from (0,1) on the y-axis above, the inverse procedure will 
convert them into random values of X on the x-axis. The proof that the 
procedure works is not given here. It relies on the fact that the trans- 
formed random variable U = F(X) is uniform on [0, 1). This is covered 
in Exercise 9-16. 0 


9.4.2 Using the Inverse Transformation Method 
to Simulate an Exponential Random Variable 


To simulate an exponential random variable with parameter и, it is 
necessary to find the inverse of the cumulative distribution function 
F(x) = 1 — e™”". This is done by solving the equation x = FY(y) for y. 
y-l—e" 
e P mpm 
—py = In(1 — x) 


y= - 120 2) = pi) 
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In the next table we show the result of transforming 5 random 
numbers from [0, 1) into values of the exponential random variable X 
with и = 2. In this case 


Po) = LS u) 


.407381 | 0.261602 
.892484 | 1.115058 


.297554 | 0.176593 
.485448 | 0.332230 
.798462 | 0.800889 


The graph below shows the results of 1000 trials in this simulation. 
The graph shows that the simulation produced values whose distribution 
approximated the shape of an exponential density function. 


01 03 05 07 09 


9.4.3 Simulating Other Distributions 


The inverse transformation method can be applied to simulate other distri- 
butions for which Е (х) is easily found. Exercises 9-17 and 9-18 ask the 
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reader to do this for the uniform? and Pareto distributions. Unfortunately, 
some useful distributions do not have closed forms for F(x) which allow 
a simple solution for F~!(x). This is true in the case of the most widely 
used distribution, the normal. Fortunately other methods are available. 
The inverse function can be approximated numerically, or entirely 
different methods can be used. Such work is beyond the scope of this 
course, but it is incorporated into computer technology that gives all of us 
the capability of generating values from a wide range of distributions. The 
spreadsheet EXCEL has inverse functions for the normal, gamma, beta 
and lognormal distributions. The statistics program MINITAB will 
generate random data from the uniform, normal, exponential, gamma, 
lognormal, Weibull and beta distributions. 


9.5 Mixed Distributions 
9.5.1 Ап Insurance Example 


In some situations, probability distributions are a combination of discrete 
and continuous distributions. The next example illustrates how this may 
happen naturally in insurance. 


Example 9.16 An insurance company has sold a warranty policy 
for appliance repair. 90% of the policyholders do not file a claim. 10% 
file a single claim. For those policyholders who file a claim, the amount 
paid for repair is uniformly distributed on (0, 1000]. In this situation, the 
probability distribution of the amount X paid to a randomly selected 
policyholder is mixed. The probability of no claim being filed is 
discrete, but the amount paid on a claim is continuous. Before we can 
describe the distribution of the amount .Y, we need to look more 
carefully at its components. 

The discrete part of this problem is the distribution of N, the 
number of claims paid. The distribution of N is shown in the following 


table. 
БЕГИЙ МОЕ ШЕГИН 
On [9% [10 


? Note that the linear congruential generator used to produce random numbers in [0, 1) 
is actually simulating a uniform distribution on [0, 1). The inverse transformation method 
can be used to simulate a uniform distribution on any other interval. 
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The continuous distribution for claim amount applies only if we are 
given that a claim has been filed. This is a conditional distribution. In 
more formal terms 


P(X < z|N = 1) = F(z|N = 1) = тү, for 0 < x < 1000. 


The insurance company needs to find the cumulative distribution func- 
tion F(z) = Р(Х < х) for X, the amount paid to any randomly selected 
policyholder. This can be done in logical steps. 


Case 1: ж < 0. The amount paid cannot be negative. If x < 0, 
P(X € x) = F(z) = 0. 


Case 2: z = 0. The probability that X = 0 is .90, the probability 
that N = 0. Then F(0) = P(X < 0) = P(X = 0) = .90. 


Case 3: 0 < x < 1000. This case requires a probability calcula- 
tion. 


F(z) = P(X < т) = PIX 20or0 < X < xj 
= P[(N = 0) or (N = land X < х)] 
= P(N =0)+ P(N = land X < x) 
= P(N = 0) + P(X € zIN = 1): P(N = 1) 


eus: (té C10 


Case 4: x > 1000. All claims are less than or equal to 1000, so 
Р(Х € х) = 1. 


We can now give a complete description of F(x) = Р(Х < z). 


0 r«0 
.90 z=0 
BUD 90 + 10 ( 7855) 0 < 2 < 1000 


1 т> 1000 
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The graph of F(x) on the interval [0, 1200] is shown below. 


The cumulative distribution function can now be used to find probabili- 
ties for X. For example, 


P(X < 500) = F(500) = .90 + 10( #506 ) — 95. 


Care is necessary over the use of the relations < and < because of the 
mixture of discrete and continuous variables. The preceding probability 
is not the same as P(0 « X < 500). 


P(0 < X < 500) = F(500) — F(0) = .95 — .90 = .05 О 
9.5.2 The Probability Function for а Mixed Distribution 
It is usually easier to derive the cumulative distribution function F(s) 
for a mixed distribution, but problems can also be stated using a mixed 
probability function which is partly a discrete probability function and 
partly a continuous probability density function. In the next example, we 


find the combined probability function for the insurance problem. 


Example 9.17 The probability function p(x) for Example 9.16 can 
also be found in logical steps. 


Case 1: ж < 0. Values less than 0 are impossible, so р(х) = 0. 


Case 2: ж = 0. Since the probability of no claim is .90, we see 
that p(0) — .90. 
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Case 3: 0<x<1000. In this case, x is a continuous random 
variable. For a continuous random variable, p(x)- f(x) is the 


derivative of F(x). We can find f(x) for this interval by taking 
the derivative of the formula for F(x) on this interval. 


IS {=F Oe 4 | + 10( 3) = 0001 


Case 4: х> 1000. This is impossible. p(x) = 0. 


We can summarize the probability function in the following definition by 
cases. 


0 x«0 
= 90 x=0 
PO) = 10091 02x «1000 
0  x»1000 


This mixed distribution is continuous on (0,1000] and is said to have a 


point mass at x = 0. It is graphed below, with the point mass indicated 
by a heavy dot. 


Mixed Density Function 


[т ie уте з, л 4 
0 800 1000 1200 
9.5.3 The Expected Value of a Mixed Distribution 


For discrete distributions, the expected value was found by summation 
of the probability function. 


E(X) = У x p(x) 
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For continuous distributions, the expected value was found by integra- 
tion of the density function. 


E(X) = [= : f(x)dx 


For mixed distributions we can combine these ideas, sum where the 
random variable is discrete, and integrate where it is continuous. This is 
done in the next example. 


Example 9.18 For the insurance example, we can use the proba- 
bility function just derived. 


*1000 
E(X) = .90(0) + if z(.0001)dz = 50 m 
0 


9.5.4 A Lifetime Example 


In the next example, we will apply the reasoning used above to the life- 
time of a machine part. 


Example 9.19 When a new part is selected for installation, the 
part is first inspected. The probability that a part fails the inspection and 
is not used is .01. If a part passes inspection and is used, its lifetime is 
exponential with mean 100. Find the probability distribution of Т, the 
lifetime of a randomly selected part. 

Solution Let S be the event that a part passes inspection. Then 
P(S) = .99 and P(~S) = .01. The given exponential distribution is the 
conditional distribution of lifetime for a part that passes inspection. 
Since the mean is 100, the parameter of the exponential distribution is 
A= 0l. 


P(T«t|S)21—e-9" = F(t|S), fort > 0 
The cumulative distribution function F(t) = P(T < t) can be found in 
steps, as before. 


Case 1: £ < 0. Values less than 0 are impossible, so F(t) = 0. 


Case 2: t = 0. When a part fails inspection, it is not used and 
T = 0. Е(0) = P(T <0)= P(T = 0) = 01. 


Applications for Continuous Random Variables 277 


Case 3: t > 0. 
F(t)- PT €0- Р(Т= 0) + Р(0<Т < 0) 


= P(~S)+ P(S and (T < t)) 
= P(~S)+ P(T <t|S)- P(S) 


= .01 + (1—е7'9'!*).99 
Then F(t) is given by 
0 t<0 
F(t) = 4 .01 t=0. 
99(1—e-9) #> 0 
The probability function is 


0 t«0 
99(01e79*) #> 0 


9.6 Two Useful Identities 


In this section we will give two identities which are used in risk manage- 
ment applications. In each case, we will state the identity first, then give 
an application to illustrate its use and finish with a discussion of the 
derivation. 


9.6.1 Using the Hazard Rate to Find the Survival Function 


Let X be a random variable defined on [0, 00). If we are given the hazard 
rate A(x), we can find the survival function S(x) using the identity 


S(xz) = e hA du, (9.6) 


Example 9.20 In Section 8.7.5, we showed that the hazard rate for 
a Weibull distribution with parameters o and 3 was 


Mz) = afix?-!. 
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Then n Naa ови! du = Bu® \ = ix^. The identity shows 
0 0 
that S(z) = e8”. L1 
To derive this identity, recall that 
S'(z) = “40 - F(z)) = — 
(к) = L0 - F(a) = -f(). 


By definition, 


хәз = Lg = - S8) - Lin so 


Then 
n AG) du ss ise) i а eiusd ess 
0 


Thus 
e" fo Mu) du = elnS(z) — S(z). 


9.6.2 Finding E(X) Using S(x) 


Let X be a random variable defined on [0,00). If we are given the 
survival function S(x) = 1 — F(x), we can find the expected value of X 
using the identity 


E(X) = [ (ае = f (фе F(a))dz. (9.7) 


Example 9.21 In Section 8.2.4, we showed that the survival 
function for an exponential random variable with parameter 8 was 


S(z) =e *, 
for z > 0. Then 


BX) = | саг = ES 20-45-25. m 


Applications for Continuous Random Variables 279 


This identity 1s derived using integration by parts. The definition 
of E(X) is 


E(X) = Joe : f(x) dz. 


If we take 


и=т v= —(1— Ё(х)) 
du = dz dv = f(x)dx 


we obtain 


BOOS a= F(x)” dg [^ ePi ds 


=0-0+ f sac = [ $9«- 
0 0 


In this derivation, we have made use of the fact that 


limz(1 — F(x) = 0. 


This requires proof: 


rI—00 


limz: S(x) = tima | /(у) ду 
= tim | æ- fiy)dy 


oo 
< tim | y- fü)dy 0 
The last equality above will hold if E(X) is defined, since 


ЕО) = | v fody. 
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9-1. 


9-2. 


9-3. 


9.2 


9-4. 


9-5. 


9-6. 


9-7. 
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Exercises 


Expected Value of a Function of a Random Variable 


Suppose the amount of a single loss for an insurance policy has 
density function f(x) = .001e7 99!z. for x > 0. If this policy has 
a $300 per claim deductible, what is the expected amount of a 
single claim for this policy? 


If the policy in Exercise 9-1 also has a payment cap of $1500 per 
claim, what is the expected amount of a single claim? 


Work Example 9.4 using the utility function u(w) = In(w). 
What are E[u(W)] and E[u(W;)]? 


Moment Generating Functions of Continuous Random 
Variables 


Let X be the random variable which is uniformly distributed 
over the interval [a,b]. Find M x(t). 


Find E(X) for the random variable in Exercise 9-4 using its 
moment generating function. 


Let X be the random variable whose density function is given by 
f(z) = 201 — х), for 0 € x € 1, and f(x) = 0 elsewhere. Find 
M x(t). 


Find E(X) for the random variable in Exercise 9-6 using its 
moment generating function. (Note: the derivative of M (t) is not 
defined at 0, but you can take the limit as t approaches 0 to find 
E(X). This is a much more difficult way to find E(X) than 
direct integration for this particular density function.) 


5 
If the moment generating function of X is (524) , identify the 


random variable X. 
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9-9. 


9-10. 


9-11, 


9-14. 


9-15. 


If X is an exponential random variable with À = 3, what is the 
moment generating function of Y = 2X +5? 


Let X be the random variable whose moment generating func- 
tion is e+"), Find E(X) and V(X). 


Let X be a normal random variable with parameters и and o. 
Use the moment generating function for X to find E(X?). Then 
show that V(X) = o?. 


The Distribution of Y = g( X) 


Let X be uniformly distributed over [0,1] and Y = е^. Find (a) 
Fy(y); (b) fy (y). 


Let X be a random variable with density function given by 
fx(x) = 3x~*, for x > 1 (Pareto with a = 3, 8 = 1), and let 
Y = In X. Find Fy (y). 


If X is the random variable defined in Exercise 9-13 and 
Y = ИХ, find (a) Fy (y); (b) fy(y). 


The monthly maintenance cost X of a machine is an exponential 
random variable with unknown parameter. Studies have deter- 
mined that Р(Х > 100) = .64. For a second machine the cost Y 
1s a random variable such that Y = 2X. Find P(Y > 100). 


Simulation of Continuous Distributions 


For a continuous random variable X, show that F'(X) is uni- 
formly distributed over [0, 1]. (i.e., show P[F(X) < т] = zx, for 
0О<х<]1. 
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For Exercises 9-17 and 9-18, use the following sequence of random 
numbers in [0, 1). 


1. 
2. 
3. 
4. 
5. 


.90463 6. .81008 11. .15533 16. .31239 
.17842 7. .49660 12. .29701 17. .68995 
.55660 8. .92602 13. .82751 18. .77787 
.55071 9. 71729 14. .67490 19. .66928 
.96216 10. .39443 15. .68556 20. .53100 


9-17. 


9-18. 


9.5 


9-19. 


9-20. 


9-2]. 


Let X be uniformly distributed over [0, 4], and use the above 
random numbers to simulate F(x). How many of the trans- 
formed values x = F~'(u) are in each subinterval [0, 1), [1,2), 
[2, 3) and [3, 4)? 


Let X have a Pareto distribution with a = 3 and б = 3, and use 
the above random numbers to simulate F(r). How many of the 
transformed values x = F'(u) are in each subinterval [3, 4), 
[4, 5), [5, 6) and [6, oo). 


Mixed Distributions 


For a certain type of policy, an insurance company divides its 
claims into two classes, minor and major. Last year 90 percent 
of the policyholders filed no claims, 9 percent filed minor 
claims, and 1 percent filed major claims. The amounts of the 
minor claims were uniformly distributed over (0, 1,000], and the 
major claims were uniformly distributed over (1,000, 10,000]. 
Find F(z), for 0 € x < 10,000. 


Find E(X) for the insurance policy in Exercise 9-19. 


An auto insurance company issues a comprehensive policy with 
a $200 deductible. Last year 90 percent of the policyholders 


"filed no claims (either no damage or damage less than the 


deductible). For the 10 percent who filed claims, the claim 
amount had a Pareto distribution with a = 3 and 8 = 200. If X 
is the random variable of the amount paid by the insurer, what is 
F(a), for x > 0? 


Applications for Continuous Random Variables 283 


9.6 


9-22. 


9-23. 


9-24. 


9-25. 


9.8 


9-26. 


Two Useful Identities 
Let X be a random variable with hazard rate A(x) = D for 
x20. Find S(x). 


Let X be a random variable with hazard rate A(x) = o , for 


0<x<100. Find S(x). 


Let X be the random variable defined in Exercise 9-22. Use 
Equation (9.7) to find E(X). 


Let X be a random variable whose survival function is given 


by 5(х) = 1005, for 0< x «100, and S(x)=0 for x>100. 


Use Equation (9.7) to find E(X). 


Sample Exam Problems 


An insurance policy pays for a random loss X subject to a 
deductible of C, where 0« C «1. The loss amount 1s modeled 
as a continuous random variable with density function 


2x for O«x«l 
fe lo otherwise 


Given a random loss X, the probability that the insurance payment 
is less than 0.5 is equal to 0.64. 


Calculate C. 


A manufacturer's annual losses follow a distribution with density 
function 


х3? 


2.5 
УО) = 2200) fo x>0.6 
0 otherwise 


To cover its losses, the manufacturer purchases an insurance 
policy with an annual deductible of 2. 


What is the mean of the manufacturer's annual losses not paid by 
the insurance policy? 
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An insurance policy is written to cover a loss, X, where X has a 
uniform distribution on [0, 1000]. 


At what level must a deductible be set in order for the expected 
payment to be 25% of what it would be with no deductible? 


A piece of equipment is being insured against early failure. The 
time from purchase until failure of the equipment is exponentially 
distributed with mean 10 years. The insurance will pay an amount 
x if the equipment fails during the first year, and it will pay 0.5x if 
failure occurs during the second or third year. If failure occurs 
after the first three years, no payment will be made. 


At what level must x be set if the expected payment made under 
this insurance is to be 1000? 


A device that continuously measures and records seismic activity 
is placed in a remote region. The time, 7, to failure of this device 
is exponentially distributed with mean 3 years. Since the device 
will not be monitored during its first two years of service, the 
time to discovery of its failure is X = max(T,2). 


Determine £[X |]. 
An insurance policy reimburses a loss up to a benefit limit of 10. 


The policyholder's loss, Y, follows a distribution with density 
function: 


2 y>l 
ЈО) = 4 
0 otherwise 


What is the expected value of the benefit paid under the insurance 
policy? 


The warranty on a machine specifies that it will be replaced at 
failure or age 4, whichever occurs first. The machine's age at 
failure, X, has density function 


1 
ло =}; 


0 otherwise 


for 0<x<5 


Let Y be the age of the machine at the time of replacement. 
Determine the variance of Y. 
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9-33. 


9-34. 


9-35. 


9-36. 


9-37. 


The owner of an automobile insures it against damage by 
purchasing an insurance policy with a deductible of 250. In the 
event that the automobile is damaged, repair costs can be 
modeled by a uniform random variable on the interval (0,1500). 


Determine the standard deviation of the insurance payment in the 
event that the automobile is damaged. 


An insurance company sells an auto insurance policy that covers 
losses incurred by a policyholder, subject to a deductible of 100. 
Losses incurred follow an exponential distribution with mean 300. 


What is the 95" percentile of actual losses that exceed the 
deductible? 


The time, T, that a manufacturing system is out of operation has 
cumulative distribution function 


for t>2 


F(t) = 1-(2) 


0 otherwise 


The resulting cost to the company is Y = T 2. 


Determine the density function of Y, for y > 4. 


An investment account earns an annual interest rate R that 
follows a uniform distribution on the interval (0.04,0.08). The 


value of a 10,000 initial investment in this account after one year 
is given by V =10,000e%. 


Determine the cumulative distribution function, F(v), of V for 
values of v that satisfy 0 « F(v) «1. 


An actuary models the lifetime of a device using the random 


variable Y 210.X ?, where X is an exponential random variable 
with mean 1 year. 


Determine the probability density function f(y), for y »0, of 
the random variable Y. 


9-39. 


9-40. 


Chapter 9 


Let T denote the time in minutes for a customer service represen- 
tative to respond to 10 telephone inquiries. 7 is uniformly 
distributed on the interval with endpoints 8 minutes and 12 
minutes. Let R denote the average rate, in customers per minute, 
at which the representative responds to inquiries. 


Find the density function of the random variable R on the 


: 10 10 
l {= <r <->]. 
ınterva (18 ү 2) 


The monthly profit of Company I can be modeled by a continu- 
ous random variable with density function f. Company II has a 
monthly profit that is twice that of Company I. 


Determine the probability density function of the monthly profit 
of Company II. 


A random variable X has the cumulative distribution function 


0 for x<1 
_ J x2-2x+2 for 1<х<2 
F(x) = pm 


1 for x22 


Calculate the variance of X. 
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10.1 Joint Distributions for Discrete Random Variables 
10.1.1 The Joint Probability Function 


We have already given an example of the probability distribution X for 
the value of a single investment asset. Most real investors own more 
than one asset. We will look at a simple example of an investor who 
owns two assets to show how things become more interesting when you 
have to keep track of more than one random variable. 


Example 10.1 An investor owns two assets. He is interested in 
the value of his investments in one year. The value of the first asset in 
one year is a random variable X, and the value of the second asset in one 
year is a random variable Y. It is not enough to know the separate 
probability distributions. The investor must study how the two assets 
behave together. This requires a joint probability distribution for X 
and Y. The following table gives this information. 


The possible values of X are 90, 100 and 110. The possible values of Y 
are 0 and 10. The probabilities for all possible pairs of individual values 
of z and y are given in the table. For example, the probability that 
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X = 90 and Y = 0 is .05. The probability values in this table define а 
joint probability function р(х, у) for X and Y, where р(х, у) is the 
probability that X = x and Y = y. This is written 


p(z,y) = Р(Х = zY = у). 


For example, 


(090,0) = P(X = 90,У = 0) = 05. 


The information here is useful to the investor. For example, when X 
assumes its lowest value, Y is more likely to assume its highest value. 
We will discuss the use of this information further in later sections. O 


Definition 10.1 Let X and Y be discrete random variables. The 
joint probability function for X and Y is the function 


p(z,y) = Р(Х = z,Y = y). 


Note that the sum of all the probabilities in the table in Example 
10.1 is 1.00. This must hold for any joint probability function. 


УУ рб, у) = 1 (10.1) 
т y 


Joint probability functions for discrete random variables аге often given 
in tables, but they may also be given by formulas. 


Example 10.2 An analyst is studying the traffic accidents in two 
adjacent towns. The random variable X represents the number of acci- 
dents in a day in town A, and the random variable Y represents the 
number of accidents in a day in town B. The joint probability function 
for X and Y is given by 


-2 
p(z, y) = zip forz =0,1,2,... andy = 0, 1,2,.... 


The probability that on a given day there will be 1 accident in town A 
and 2 accidents in town B is 


p(1,2) = 1m 25 .068. О 
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The above probability function must satisfy the requirement 
Ууу р(х, у) = 1. If a probability function is given in a problem in this 
z y 


text, the reader may assume that this is true. For the above probability 
function, it is not hard to prove that the sum of the probabilities is 1. 


10.1.2 Marginal Distributions for Discrete Random Variables 


Once we know the joint distribution of X and Y, we can find the 
probabilities for individual values of X and Y. This is illustrated in the 
next example. 


Example 10.3 The table of joint probabilities for the asset values 
in Example 10.1 is the following: 


The probability that X is 90 can be found by adding all joint probabili- 
ties in the first column of the table above. 


Р(Х = 90) = P(X = 90,Y = 0) + P(X = 90,Y = 10) 
= 05+ .15 = .20 
The probabilities that Р(Х = 100) and Р(Х = 110) сап be found in the 
same way. The probability that Y is 0 can be found by adding all the 


joint probabilities in the first row of the table. 


P(Y = 0) = .05 + .27 + .18 = 50 
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The probability that Y is 10 can be found in the same way. It 1s efficient 
to display the probability function table with rows and columns added to 
give the individual probability distributions of X and У. 


The individual distributions for the random variables X and Y are called 
marginal distributions. Г] 


Definition 10.2 The marginal probability functions of X and Y 
are defined by the following: 


рхбт) = 3 plz, y) 
y 


pyu) = > plz, y) 


Example 10.4 The joint probability function for numbers of acci- 
dents in two towns in Example 10.2 was 


-2 
The marginal probability functions are 
со „—2 2525 -1 
её l_e _ е 
Py(x) = тїї ^ Ey т^ т! 
y=0 y=0 
and 
mrs e? Sil e? e 
Pra) = ) ату = 972,21 = Gre = “ar 


Each marginal distribution is Poisson with А = 1. О 


Multivariate Distributions 291 


10.1.3 Using the Marginal Distributions 


Once the marginal distributions are known, we can use them to analyze 
the random variables X and Y separately if that is desired. 


Example 10.5 For the asset value joint distribution in Examples 
10.1 and 10.3, 
P(X > 100) 2.60 + .20 = .80 
and 
P(Y > 0) = .50. o 


Example 10.6 For the accident number joint distribution in 
Examples 10.2 and 10.4, both X and Y were Poisson with A = 1. Thus 


P(X = 2) = PY = 2) = ©". O 


In the following examples, we will calculate the mean and vari- 
ance of the random variables in the last two examples. This information 
is important for future reference, since we will find these expectations 
by another method involving conditional distributions in Section 11.5. 


Example 10.7 For the asset value joint distribution in Examples 
10.1 and 10.3, 


E(X) = 90(.20) + 100(.60) + 110(.20) = 100 

and 

Е(У) = 0(.50) + 10(.50) = 5. 
To find variances, we first calculate the second moments. 

E(X?) = 90?(.20) + 1007(.60) + 1107(.20) = 10,040 
Е(Ү?) = 07(.50) + 107(.50) = 50 

Then 

V(X) = 10,040 — 100? = 40 


and 


V(Y) = 50 — 5 = 25. [1 
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Example 10.8 For the accident number joint distribution in 
Examples 10.2 and 10.4, both X and Y were Poisson with À — 1. Thus 
E(X) = EY) = V(X) = V(Y) = 1. о 


10.2 Joint Distributions for Continuous Random Variables 
10.2.1 Review of the Single Variable Case 


Probabilities for a continuous random variable X are found using a 
probability density function f(z) with the following properties: 


G) f(x) > 0 forall =. 


(ii) The total area bounded by the graph of y = f(x) and the z- 
axis is 1.00. 
/ f(z)dy = 1 
—o6 


(ii) P(a < X < b) is given by the area under у = f(x) between 
2 -aandz- Б. 


b 
P(a < Х <) = | f(x) ат 


It is important to review these properties, since the joint probability 
density function will be defined in a similar manner. 


10.2.2 The Joint Probability Density Function 
for Two Continuous Random Variables 


Probabilities for a pair of continuous random variables X and Y must be 
found using a continuous real-valued function of two variables f(z, y). 
A function of two variables will define a surface in three dimensions. 
Probabilities will be calculated as volumes under this surface, and 
double integrals will be used in this calculation. 
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Definition 10.3 The joint probability density function for two 
continuous random variables X and Y is a continuous, real-valued 
function f(x, y) satisfying the following properties: 


(i) f(x,y) > 0 for all z, y. 


(ii) The total volume bounded by the graph of z = f(x, y) and 
the z-y plane is 1.00. 


f. f te y)drdy= 1 (10.3) 


Qi) P(a<X<b,c<Y < d) is given by the volume between 
the surface z = f(z,y) and the region in the z-y plane 
bounded by x = а, z = b, у = cand y = d. 


b pd 
Ра<Х<һе<у <@= | | fæ vdydz (10.4) 


Example 10.9 A company is studying the amount of sick leave 
taken by its employees. The company allows a maximum of 100 hours of 
paid sick leave in a year. The random variable X represents the leave 
time taken by a randomly selected employee last year. The random 
variable Y represents the leave time taken by the same employee this 
year. Each random variable is measured in hundreds of hours, e.g., 
X = .50 means that the employee took 50 hours last year. Thus X and 
Y assume values in the interval [0,1]. The joint probability density 
function for X and Y is 


f(x,y) 52-—12z – .8у, forrO<24<1,0<y< I. 


The surface is shown in the next figure. 
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We will first verify that the total volume bounded by the surface and the 
z-y plane is 1. 


1 pl 1 | 
i! n (2— 1.22 — .8y) dz dy = | (2r — .6х°— .8ту)| dy 
о Jo 0 т=0 


l 
= n (1.4—.8y)dy = 1 
0 


To illustrate a basic probability calculation, we will find the probability 
that X > .50 and Y > .50. In the notation used in property (ii) of 
Definition 10.3, we need to find 


1 pl 
P(50€ X < 1.0, .50 < У € =) f f(z,y)dydx 
5 J.5 


1 pl 
= / i (2 — 1.22 — .8y)dy dz 
5 J.5 


l t 
= f (2y — 12xy — .4y’) dx 
5 y-5 


І 
= | (.7 — .бх)ах = .125. 
5 


The volume represented by this calculation is shown in the next figure. 
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The region of integration for this probability calculation is the region in 
the x-y plane defined by R = {(z,y)|.50 < x < 1 апа 50<y< 1). It 
is often helpful to include a separate figure for the region of integration. 
This is given below. 


In this example, the random variables X and Y were limited to the 
interval (0, 1]. The next example gives random variables which assume 
values in [0, оо). o 


Example 10.10 In Example 10.2, an analyst was studying the 
traffic accidents in two adjacent towns, A and B. That example gave the 
joint distribution of X and Y, the discrete random variables for the 
number of accidents in the two towns. In this example we look at the 
continuous random variables S and T, the time between accidents in 
towns A and B, respectively. The joint density function of S and T is 


f(s,t) = е-@+®9, for s > бапа? > 0. 


We will first check that the total volume under the surface is 1.00. 


f [ ea | e '(—e^*) 
o Jo 0 


The density function can now be used to calculate probabilities. For 
example, the probability that 5 < 1 and T' € 2 is given by the following: 


oc 
з= 


d= | edic 
0 0 
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2 1 
po<s<1ost<y= | [еба 
0 JO 


= [ ee» 


2 
= n e '(1—e7!)dt 
0 


= (1 — e™!)(1 — e7?) x .547 L1 


dt 


1 
s= 


10.2.3 Marginal Distributions for Continuous Random Variables 


In Section 10.1.2, we found the discrete marginal distribution px(x) by 
keeping the value of x fixed and adding the values of р(х, y) for all y. 
Similarly, p, (y) was found by fixing y and adding over т values. These 
marginal probability functions are given by Equations (10.2a) and 
(10.2b). 

For continuous functions, the addition is performed continuously 
by integration. Thus the marginal distributions for a continuous joint 
distribution are defined by integrating over x or y instead of summing 
over x Or y. 


Definition 10.4 Let f(z, у) be the joint density function for the 
continuous random variables X and Y. Then the marginal density 
functions of X and Y are defined by the following: 


fx@)= | fs, 3) dy 


js / fs, y)ds 


The probability distributions of X and Y are referred to as the marginal 
distributions of X and У. 


Example 10.11 For the sick leave random variables of Example 
10.9, the joint density function was f(z,y)—c2— 1.22 —.8y, for 
0<a2<1,0< у < 1. The marginal density functions are 
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1 1 
fx(2) = Í (2 — 1.2x — .8y) dy = (2y – 1.2zy — a| = 1.6 — 1.22 
0 
and 
! l 
P= I (2— 1.2x — 8y) dz = (2x — 6r? = 82у), аы By. 
0 
L1 


Example 10.12 For the joint distribution of waiting times for 
accidents in Example 10.10, the joint probability density function was 
f(s,t) = e 6*9, for s > 0 and t > 0. The marginal density functions 
are 


fs(s) — / f(s, t)dt = | e 6*0gt = ef edt =e 
—©© 0 0 


апа 
frit) = / f(s,t)ds = f e G*0gs = e eds = е^! 
—oo 0 0 
The marginal distributions of S and T are exponential with \ = 1. О 


10.2.4 Using Continuous Marginal Distributions 


We can now use the continuous marginal distributions to study X and Y 
separately. 


Example 10.13 Let X be the number of sick leave hours last year 
and Y the number of sick leave hours this year from Example 10.9. We 
showed in Example 10.11 that 

fx(z) = 1.6-1.22,for0<24<1 
and 


Дуу) = 1.4— 8y, for0 € y <1. 


We can now calculate probabilities of interest. 
1 
P(X > .50) = | (1.6 — 1.22) dz = .35 
5 


I 
P(Y > 50) = J (14 — .8y) dy = .40 
5 
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For each year the above probability is the probability that the sick leave 
exceeds 50 hours. This probability has increased from last year to this 
year. We can see the same type of increase if we calculate expected 
values. 


1 1 
B(x) = | z- fxa)da= | (1.6x — 1.22?) dz = .40 


1 1 
E(Y) = | у: (у) у = | (14y—.8y’)dy = 43 
0 0 


The mean number of sick leave hours has increased from 40 to 43.33. O 


Example 10.14 Let S and T' be the accident waiting times in 
Example 10.12. The marginal distributions of S and T' each have an 
exponential distribution with A = 1. Thus Е(5) = Е(Т) = 1 and 
P(S >1)= P(T See Г] 


10.2.5 More General Joint Probability Calculations 
In the previous examples, we have only used the joint density function to 


find the probability that X and Y lie within a rectangular region in the 
x-y plane. 


f b d 
Ра<Х<һе<у <@= | | лела 


Integration of the joint density function can be used to find the probabili- 
ty that X and Y lie within a more general region R of the x-y plane, 
such as a triangle or a circle. We will not prove this, but will use this fact 
in applied problems. The general probability integral statement is 


P((XX,Y)eR)- E y) da dy. 


The next example is typical of the kind of probability calculation 
which requires integration over a more general region. 
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Example 10.15 Let X be the sick leave hours last year and Y the 
sick leave hours this year as given in Example 10.9. Suppose we wish to 
find the probability that an individual's sick leave hours are greater this 
year than last year. This is P(Y > X). Recall that X and Y assume only 


non-zero values in the rectangular region of the x-y plane, where 
0<х<1 and 0€ y x1. The region А where Y>X 15 the triangular half 


of that rectangle pictured below. 


y 


To find P(Y» X) we must integrate the density function over that 
region. 


Р(х,у) = ff fixy) dxdy 


= f, [ @-12х-з› dx dy 


0 
1 
= / (2х—.6х? –.8ху)|; ‚ау 
0 


i + 
= / (2y-14y*)dy = +$ = 53 
0 


The probability that the number of sick leave hours for an employee 
increases over the two years is .53. 0 
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10.3 Conditional Distributions 
10.3.1 Discrete Conditional Distributions 


We will illustrate conditional distributions by returning to our previous 
examples. 


Example 10.16 The joint probability function for the two assets 
in Examples 10.1 and 10.3 is given below (with marginals included). 


Suppose we are given that Y — 0. Then we can compute conditional 
probabilities for X based on this information. 


P(X = 90|Ү = 0) = 


100, 0) 27 
( | ) py (0) .50 


110, 0) 18 
PU c ily oy RIO 0) ..18-.. д 
( | )- ^0) = 50 


These values give a complete probability function p(z|Y = 0) for X, 
given the information that Y = 0. 


(x opono] 


In this calculation, the conditional probabilities were obtained by 
dividing each joint probability in the first row of the table above by the 
marginal probability at the end of the first row. A similar procedure 
could be used for the second row to obtain the conditional distribution 
for X given that Y — 10. 
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90 | 100 | 110 
p(z|10) | .30 | .66 | .04 


The two conditional distributions show that there is a useful relation 
between X and Y. When Y is low (Y = 0), then X has a greater proba- 
bility of assuming higher values; when Y is high (Y = 10), then X has a 
greater probability of assuming lower values. Thus X and Y tend to 
offset the risk of the other. Г] 


The calculation technique used here is summarized in the follow- 
ing definition. 


Definition 10.5 The conditional probability function of X, 
given that Y — y, is given by 


P(X = у = y) = ely) = Be. 


Similarly, the conditional probability function of Y, given that X = zx, is 
given by 


PY = ylX = т) = pyle) = De. 


Example 10.17 The conditional probability function of Y, given 
that X = 90, is given by 


P(Y = 01X = 90) = 209,0) _ 05 _ 25 
( | ) = рх(90) 2 


and 


p(90, 10) 10) _ E 
p.00) = 30 = 75. m 


P(Y = 10|X = 90) = 
Example 10.18 In Example 10.2, the joint probability function for 
X and Y (the numbers of accidents in two towns) was given by 


р(х,у) = Se forz = 0,1, 1,2,... andy = 0,1,2,.... 


In Example 10.4 we showed that the marginal probability functions were 
Poisson with A = 1. 
-1 -1 
рх(х) = £4 py(y) = ER 
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This enables us to compute conditional probability functions. 


Б=2 


— Py) zh! _ e! 
p(z|y) Py(y) et т! 
y! 


Thus the conditional distribution of X, given Y — y, is also Poisson 
with A = 1. The conditional distribution of Y, given X = z, is also 
Poisson. 


-1 
p(y|z) = ЛЕ LI 


10.3.2 Continuous Conditional Distributions 


Conditional distribution functions for two continuous random variables 
X and Y are defined using the pattern established for discrete random 
variables. 


Definition 10.6 Let X and Y be continuous random variables 
with joint density function f(z, y). The conditional density function for 
X, given that Y — y, is given by 


ау = 9) = fel) = 16:92. 


Similarly, the conditional density for Y, given that X = z, is given by 


ЛИХ = т) = о) = FSW. 


Example 10.19 Let X be the sick leave hours last year and Y the 
sick leave hours this year from Example 10.9. The joint density and 
marginal density functions are 


fíz.y)22-—12z-.8y forO<2¢<1,0<y< 1, 


fx(z) = 1.6 – 1.22, forO < х <1, 
and 


fr) = 1.4 – 0.8y, forO<y< I. 
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Using Definition 10.6, we can calculate the conditional densities. 


fly) = Du = Cut. fo0czxl 
fala) = Fad = 22980 ford « y « 1 


This enables us to calculate probabilities of interest. Suppose an individ- 
ual had X — .10 (10 hours of sick leave last year). Then his conditional 
density for Y (the hours of sick leave this year) is 


— 2—12(10) —.8y _ 1.88 — .8y 
fd) 2 r6-12C10) TU qas dero s yx. 


The probability that this individual has less than 40 hours of sick leave 
next year is P(Y « .40| X — .10). 


40 
"(1.88 — .8у 
PY < 40|Х = a9 = | dy = .465 Г] 
( | | ( 148 ) y 


Example 10.20 For the joint distribution of waiting times for 
accidents in Example 10.10, the joint probability density function and 
marginal density functions were 


f(s,t) = e 6*9, fors > 0,1» 0, 
fs(s) =e’, for s > 0, 
and 


fr (t) = et, fort > 0. 


The conditional densities are identical with the marginal densities. 


f(s|t) = fey = = e =e, frs > 0 


e Sth 


f(tls) = Ie = = e =e, fort >0 o 


304 Chapter 10 


10.3.3 Conditional Expected Value 


Once the conditional distribution 1s known, we can compute conditional 
expectations. For discrete random variables we have the following: 


Е(Ү|Х = 2) = $ y pla) (10.6a) 
y 


E(X|Y = y) = Ý z- p(aly) (10.6b) 


Example 10.21 Let X and Y be the asset value random variables 
of Example 10.1. The conditional distribution of X, given that Y — 0, 
was found in Example 10.16. 


C x [99 [19 [119] 


The conditional expected value of X, given that Y = 0, is 
E(X|Y = 0) = 90(.10) + 100(.54) + 110(.36) = 102.60. L1 


When X and Y are continuous, the conditional expected values are 
found by integration, rather than summation. 


E(Y|X = з) = | y- f(ylz) dy 


E(X|Y =y) ne 


Example 10.22 Let X be the sick leave hours last year and Y the 
sick leave hours this year from Example 10.9. The conditional density 
function of Y, given X = .10, is 


f(y|.10) = 158,39 foro « y <1. 
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The conditional expected value, given that X = .10, is found by using 
Equation (10.7a). 


оо И 
Ex = 10) = | v: faldu = | u(i dy = 455 
-X 0 . 
o 


Conditional variances can also be defined. There are some inter- 
esting applications of conditional expected values and variances. These 
will be discussed in Section 11.5. 


10.4 Independence for Random Variables 
10.4.1 Independence for Discrete Random Variables 


We have already discussed independence of events. When two events A 
and В are independent, then P(A N B) = P(A)- P(B). The definition 
of independence for two discrete random variables relies on this multi- 
plication rule. If the events X = x and Y = y are independent, then 
P(X = x and Y = y) = Р(Х = xz). P(Y = у). 


Definition 10.7 Two discrete random variables X and Y are 
independent if 


р(х, у) = py(z) : py), 
for all pairs of outcomes (=, y). 


Example 10.23 A gambler is betting that a fair coin will come up 
heads when it is tossed. If the coin comes up heads, he gets $1; 
otherwise he must pay $1. He bets on two consecutive tosses. X is the 
amount won or paid on the first toss, and Y is the corresponding amount 
for the second toss. The joint distribution for X and Y is given below 
with marginal distributions. 
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The values of p(z, y) in this table were constructed using the multiplication 
rule, since we know that successive coin tosses are independent. Definition 
10.7 is satisfied, and X and Y are independent random variables. 

In this betting example, joint distribution functions were con- 
structed by the multiplication rule because the events involved were 
known to be independent. We can also look at joint distributions which 
have already been constructed and use the definition to check for inde- 
pendence. oO 


Example 10.24 The joint probability function for the two assets 
in Examples 10.1 and 10.3 is given below (with marginals included). 


л. 


Note that p(90,0) = .05 and p,(90) - р„(0) = .20(.50) = .10. The ran- 
dom variables X and Y are not independent. L1 


Example 10.25 In Example 10.2, the joint probability function 
and marginals for X and Y (the numbers of accidents in two towns) 
were 


p(z, y) = fp forz = 0,1,2,... andy = 0,1,2,..., 


Py(Z) = E 


and 
-1 
= ё 
py(9) = Sr. 
In this case, p(z,y) = py(x)- py(y), and X and Y are independent. 
(This is probably a reasonable assumption to make about numbers of 


accidents in two different towns.) L1 


In Example 10.18 we found the conditional distributions for the 
independent accident numbers X and Y. We showed that these condi- 
tional distributions were the same as the marginal distributions. This is 
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an identity that holds in general for independent random variables X and 
Y. 


Conditional Discrete Distributions for Independent X and Y 


p(z|y) = py (x) (10.82) 


p(y|z) = py (v) (10.8b) 


This follows directly from the definitions of independence and the con- 
ditional distribution. 


p(z, y) _ Pæ) py) = 
p(z|y) = Py “py ly) independence py (uy) Рх(т) 
10.4.2 Independence for Continuous Random Variables 


The definition of independence for continuous random variables is the 
natural modification of the definition for the discrete case. 


Definition 10.8 Two continuous random variables X and Y are 
independent if 
f(z.y) = fx(x)- fry), 
for all pairs (x, y). 


Example 10.26 Let X be the sick leave hours last year and Y the 
sick leave hours this year from Example 10.9. The joint density and 
marginal density functions are 


Р(х,у) = 2— 122х— 8y, forO< 2 <1,0<y<1, 
fx(x) = 1.6 – 1.22, for0 < х <1, 


and 


fv(y) = 1.4— 0.8y, ford € y € 1. 


X and Y are not independent, since f(z, у) Æ fx(x)- fv(y). Li 
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Example 10.27 For the joint distribution of waiting times for 
accidents in Example 10.10, the joint probability density function and 
marginal density functions were 


f(s,t) = eS"), for $20,120, 
fs(s) = е°, for s20, 


and 
fr = e”, for £20. 


In this case f (s,t)= fs (s): fr(t)and S and T are independent. (This is 
also a reasonable assumption to make about time between accidents in 
two different towns.) E] 


As in the discrete case, the conditional distributions for indepen- 
dent random variables X and Y are the same as their marginal distribu- 
tions. 


Conditional Continuous Distributions for 
Independent X and Y 


Fly) = fx О) (10.92) 
fx) = fyQ) (10.9b) 


10.5 The Multinomial Distribution 


In this chapter we have studied bivariate distributions. In many cases 
there are more than two variables and we have a true multivariate 
distribution. We will illustrate this by looking at the widely used 
multinomial distribution. 


The multinomial distribution will remind you of the binomial 
distribution, and the binomial distribution is a special case of it. Before 
starting, we will review the partition counting formula —formula 2.10 of 
Chapter 2. 
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Counting Partitions 


The number of partitions of n objects into k distinct groups of size 
Hj ,715,...,, 15 given by 


n n! 
nn»... My ny! лу! п) 


Suppose that a random experiment has k mutually exclusive outcomes 
E, E,, with P(E;) = pj. Suppose that you repeat this experiment in п 
independent trials. Let X; be the number of times that the outcome Е, 
occurs in the z trials. Then 


n m „п n 
= | Je Pr Py! 


HH» ys My 


Example 10.28 You are spinning a spinner that can land on 
three colors — red, blue and yellow. For this spinner P(red)-.4, 


P(blue) = .35, and P(yellow) =.25, you spin the spinner 10 times. What 
is the probability that you spin red five times, blue three times and 
yellow two times? 


Solution There are К =3 mutually exclusive outcomes. Let Х|, 
X; and X; be the number of times the spinner comes up red, blue, and 
yellow respectively. Then p =P(X,)=.4, p,-P(X,)-.35, and 
рз = P(X;) 2.25. We need to find 


P(X,-5& X, -3& X,-2) = Ro 


2520(.4°.35°.257) 


Ш 


The sample exam problem 10-37 uses the multinomial distribution. 
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10.6 
10.1 


10-1. 


10-2. 


10-3. 
10-4. 


10-5. 


10.2 


10-6. 


10-7. 


10-8. 


10-9. 


10-10. 
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Exercises 
Joint Distributions for Discrete Random Variables 


Let p(z,y) = (zy + yy27, for x = 1,2,3 and y = 1,2, be the 
joint probability for the random variables X and У. Construct a 
table of the joint probabilities of X and Y and the marginal 
probabilities of X and У. 


A company has 5 CPA's, 3 actuaries, and 2 economists. Two of 
these 10 professionals are selected at random to prepare a report. 
Let X be the random variable for the number of CPA's chosen 
and let Y be the random variable for the number of actuaries 
chosen. Construct a table of the joint probabilities for X and Y 
and the marginal probabilities of X and У. 


For the random variables in Exercise 10-1, find E(X) and E(Y ). 
For the random variables in Exercise 10-2, find E(X) and E(Y ). 


For the random variables in Exercise 10-2, find V(X) and V (Y). 


Joint Distributions for Continuous Random Variables 


Show that the function f(z,y)= i + 5 + 5 + zy, for 


0 <= < 1апа40 < у < 1, isa joint probability density function. 
Find P(0 € X < .50,.50< Y < 1). 


For the joint density function in Exercise 10-6, find (a) fx(z); 
(b) fy (y). 


Let f(z,y)-2z^-3y, fo 0<у<х< 1. Find (a) fx(x); 
(b) fy(y). 


For the joint density function in Exercise 10-8, use the marginal 
distributions to find (a) P(X > .50); (b) P(Y > .50). 


For the joint density function in Exercise 10-6, find E(X). 
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10-11. 


10-12. 


10-13. 


10-14. 


10-15. 


10.3 


For the joint density function in Exercise 10-6, find P(X > Y). 


For the joint density function in Exercise 10-8, find E(X) and 
EXY ). 


An auto insurance company separates its comprehensive claims 
into two parts: losses due to glass breakage and losses due to 
other damage. If X is the random variable for losses due to glass 
breakage and Y the random variable for other damage, 
f(z, y) = (30 – x — y)/1875, for0 < z € 5,0 € y < 25, where 
т and y are in hundreds of dollars. Find P(X > 4, Y > 20). 


For the random variables in Exercise 10-13, find (a) fx(x); 


(b) fv (). 


For the random variables in Exercise 10-13, find E(X) and 
Е(Ү). 


Conditional Distributions 


Exercises 10-16, 10-17 and 10-18 refer to Exercise 10-1. 


10-16. 


10-17. 


10-18. 


10-19. 


10-20. 


10-21. 


10-22. 


Find P(X|Y = 1). 
Find P(Y|X = 1). 
Find E(X|Y = 1). 
For the joint density function in Exercise 10-6, find f(x | y). 
For the joint density function in Exercise 10-8, find f(y] =). 


For the conditional density function in Exercise 10-20, find 
(a) f(y1.50); (b) E(Y | X = .50). 


If f(z,y) = 6x, for O«z«ygy«] and 0 elsewhere, find 
(a) fy(y); (6) f(z| y (с) ECX | Y = у); (d) E(X |] Y = .50). 
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10.4 


10-23. 


10-24. 


10-25. 


10-26. 


10.7 


10-27. 


10-28. 
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Independence for Random Variables 


Determine if the random variables in Exercise 10-1 are depend- 
ent or independent. 


Determine if the random variables in Exercise 10-2 are depend- 
ent or independent. 


Determine if the random variables in Exercise 10-6 are depend- 
ent or independent. 


Determine if the random variables in Exercise 10-8 are depend- 
ent or independent. 


Sample Actuarial Examination Problems 


A doctor is studying the relationship between blood pressure and 
heartbeat abnormalities in her patients. She tests a random sample 
of her patients and notes their blood pressures (high, low, or 
normal) and their heartbeats (regular or irregular). She finds that: 


(1) 14% have high blood pressure. 

(п) 22% have low blood pressure. 

(iii) 15% have an irregular heartbeat. 

(iv) Of those with an irregular heartbeat, one-third have high blood 
pressure, 

(v) Of those with normal blood pressure, one-eighth have an 
irregular heartbeat. 


What portion of the patients selected have a regular heartbeat and 
low blood pressure? 


A large pool of adults earning their first driver’s license includes 
50% low-risk drivers, 30% moderate-risk drivers, and 20% high- 
risk drivers. Because these drivers have no prior driving record, an 
insurance company considers each driver to be randomly selected 
from the pool. This month, the insurance company writes 4 new 
policies for adults earning their first driver’s license. 


What is the probability that these 4 will contain at least two more 
high-risk drivers than low-risk drivers? 
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10-29. 


10-30. 


10-31. 


10-32. 


A device runs until either of two components fails, at which 
point the device stops running. The joint density function of the 
lifetimes of the two components, both measured in hours, is 


x+y 


foy) = for O<x<2 and O<y<2 


What is the probability that the device fails during its first hour 
of operation? 


A device runs until either of two components fails, at which 
point the device stops running. The joint density function of the 
lifetimes of the two components, both measured in hours, is 


fay) = > for O<x<3 and 0<у<3 


Calculate the probability that the device fails during its first hour 
of operation. 


A device contains two components. The device fails if either 
component fails. The joint density function of the lifetimes of the 
components, measured in hours, 15 /(5,/), where 0<s<1 and 


0 </<1. 


Express the probability that the device fails during the first half 
hour of operation as a double integral. 


The future lifetimes (in months) of two components of a machine 
have the following joint density function: 


— 68 (s9_y_y) for 0«x«50-y«50 
(50-x-y) 19 y 
f(xy) = 4 125,000 

0 otherwise 


What is the probability that both components are still functioning 
20 months from now? Express your answer as a double integral, 
but do not evaluate it. 
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10-33. 


10-34. 


10-35. 
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An insurance company sells two types of auto insurance policies: 
Basic and Deluxe. The time until the next Basic Policy claim is 
an exponential random variable with mean two days. The time 
until the next Deluxe Policy claim is an independent exponential 
random variable with mean three days. 


What is the probability that the next claim will be a Deluxe 
Policy claim? 


Two insurers provide bids on an insurance policy to a large com- 
pany. The bids must be between 2000 and 2200. The company 
decides to accept the lower bid if the two bids differ by 20 or 
more. Otherwise, the company will consider the two bids further. 


Assume that the two bids are independent and are both uniformly 
distributed on the interval from 2000 to 2200. 


Determine the probability that the company considers the two 
bids further. 


A car dealership sells 0, 1, or 2 luxury cars on any day. When 
selling a car, the dealer also tries to persuade the customer to buy 
an extended warranty for the car. 


Let X denote the number of luxury cars sold in a given day, and 
let Y denote the number of extended warranties sold. 


P(X=0, Y -0) = 1/6 
P(X=1, Ү=0)= 1/12 
P(X=1, Y=1)= 16 
P(X = 2, = 0) = 1/12 
Р(Х= 2, Y= 1) = 1/3 
Р(Х= 2, Ү= 2) = 1/6 


What is the variance of X? 
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10-36. 


10-37. 


10-38. 


Let X and Y be continuous random variables with joint density 
function 


24xy for 0<х<1 and O«y«l-x 


0 otherwise. 


S(x,y) = | 


Find Р(у<хүх=1) 


Опсе a fire is reported to a fire insurance company, ће company 
makes an initial estimate, X, of the amount it will pay to the 
claimant for the fire loss. When the claim is finally settled, the 
company pays an amount, Y, to the claimant. The company has 
determined that X and Y have the joint density function 


E 2 -(2x-1)/(x-1) 1 1. 
f(xy) a , x»Lhy» 


Given that the initial claim estimated by the company is 2, 
determine the probability that the final settlement amount is 
between 1 and 3. 


A company offers a basic life insurance policy to its employees, 
as well as a supplemental life insurance policy. To purchase the 
supplemental policy, an employee must first purchase the basic 
policy. 


Let X denote the proportion of employees who purchase the 
basic policy, and Y the proportion of employees who purchase 
the supplemental policy. Let .X and Y have the joint density 
function f(x,y)-2(x-y) on the region where the density is 


positive. 


Given that 1096 of the employees buy the basic policy, what is 
the probability that fewer than 596 buy the supplemental policy? 
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Two life insurance policies, each with a death benefit of 10,000 
and a one-time premium of 500, are sold to a couple, one for 
each person. The policies will expire at the end of the tenth year. 
The probability that only the wife will survive at least ten years 
is 0.025, the probability that only the husband will survive at 
least ten years is 0.01, and the probability that both of them will 
survive at least ten years is 0.96. 


What is the expected excess of premiums over claims, given that 
the husband survives at least ten years? 


10-40. A diagnostic test for the presence of a disease has two possible 


10-41. 


outcomes: 1 for disease present and 0 for disease not present. Let 
X denote the disease state of a patient, and let Y denote the 
outcome of the diagnostic test. The joint probability function of 
X and Y is given by: 

P(X-0, Y= 0) = 0.800 

Р(Х = 1, Y=0) = 0.050 

Р(Х= 0, Y= 1) = 0.025 

Р(Х = 1, Y= 1) = 0.125 


Calculate Var(Y | X=1). 


The stock prices of two companies at the end of any given year 
are modeled with random variables XY and Y that follow a 
distribution with joint density function 


2x for 0<х<1 and x<y<x+l 


0 otherwise 


foy) = | 


What is the conditional variance of Y given that X = х? 
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10-42. 


10-43. 


10-44. 


An actuary determines that the annual numbers of tornadoes in 
counties P and Q are jointly distributed as follows: 


Annual number in Q 
nual number in P 


0.10 | 0.02 | 


Calculate the conditional variance of the annual number of 
tornadoes in county Q, given that there are no tornadoes in 
county P. 


A company is reviewing tornado damage claims under a farm 
insurance policy. Let X be the portion of a claim representing 
damage to the house and let Y be the portion of the same claim 
representing damage to the rest of the property. The joint density 
function of X and Y is 


6}1—(x+y)} for x>0,y>0 and х+у<1 
f(x,y) = | | | 
0 otherwise 


Determine the probability that the portion of a claim representing 
damage to the house is less than 0.2. 


Let X and Y be continuous random variables with joint density 
function 


15у for x <у<х 


0 otherwise 


fo») = | 


Find g, the marginal density function of Y. 
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An auto insurance policy will pay for damage to both the 
policyholder’s car and the other driver’s car in the event that the 
policyholder is responsible for an accident. The size of the 
payment for damage to the policyholder’s car, X, has a marginal 
density function of 1 for 0<x<1. Given X = х, the size of the 


payment for damage to the other driver’s car, Y, has conditional 
density of 1 for x< y < x«1. 


If the policyholder is responsible for an accident, what is the 
probabihty that the payment for damage to the other driver's car 
will be greater than 0.500? 


An insurance policy is written to cover a loss X where X has 
density function 


3x? for O<x<2 
Јо) = + $ 
0 otherwise 


The time (in hours) to process a claim of size x, where 0< x x 2, 
is uniformly distributed on the interval from x to 2x. 


Calculate the probability that a randomly chosen claim on this 
policy is processed in three hours or more. 


Let X represent the age of an insured automobile involved in an 
accident. Let Y represent the length of time the owner has 
insured the automobile at the time of the accident. 


X and Y have joint probability density function 


1 (10— ху?) for 2<х<10 and 0<у<1 
Го, у) = [^ 


otherwise 


Calculate the expected age of an insured automobile involved in 
an accident. 
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10-48. A device contains two circuits. The second circuit is a backup for 
the first, so the second is used only when the first has failed. The 
device fails when and only when the second circuit fails. 


Let X and Y be the times at which the first and second circuits 


fail, respectively. X and Y have joint probability density function. 


6e e? for 0<х<у<о 
Р(х,у) = | 4 


0 otherwise 


What is the expected time at which the device fails? 


10-49. A study of automobile accidents produced the following data: 


An automobile from one of the model years 1997, 1998, and 
1999 was involved in an accident. 


Probability of 
Proportion of | Involvement 
Model | All Vehicles | in an Accident 


|197 | 016 | 005 . | 


0.03 

0.04 
Determine the probability that the model year of this automobile 
is 1997. 


Chapter 11 
Applying Multivariate 
Distributions 


11.1 Distributions of Functions of 
Two Random Variables 


11.1.1 Functions of X and Y 


Many practical applications require the study of a function of two or 
more random variables. For example, if an investor owns two assets with 
values X and Y, the function (Х,У) = X + Y is the random variable 
that gives the total value of his two assets. 

In this text, we will focus on four important functions: X 4- Y, 
XY, minimum(X,Y), and maximum(X,Y). The reader should be 
aware that a more general theory can be developed for a wider class of 
functions g( X , У), but that theory will not be developed in this text. 


11.1.2 The Sum of Two Discrete Random Variables 


Example 11.1 We return to the two asset random variables X and 
Y in Example 10.1. 
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Probabilities for the sum S = X + Y can be found by direct inspection. 
For example, X + Y = 90 can occur only if х = 90 and y = 0. 


P(X +Y = 90) = p(90, 0) = .05 
X + Y assumes a value of 100 for the two outcome pairs (100,0) and 
(90, 10). 
P(X + Y = 100) = (100,0) + р(90, 10) = 27 + .15 = .42 
Similarly, 


P(X + Y = 110) = (110,0) + p(100, 10) = .18 + .33 = 51 
and 
P(X + Y = 120) = р(110, 10) = .02. 


We have now found the entire distribution of S = X + Y. 


| s | 90 | 100 | 110 | 120 | 
o 


The technique we used to find p(s) was simply to add up all values 
of p(x, y) for which x + y = s. Another way to say this is that we added 
all joint probability values of the form р(х, s — x). This is stated symbo- 
lically as 


p(s) = X` pz, s — x). (11.1) 


11.1.3 The Sum of Independent Discrete Random Variables 


When the two random variables X and Y are independent, then we have 
p(z,s — £) = py(x) : py(s — х). In this case Equation (11.1) assumes а 
form that is convenient for calculation. 


Probability Function for 5 = X + У 
(X and Y are Independent) 


ps(s) = X` py): py(s — 2) 
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Example 11.2 An insurance company has two clients. The 
random variables representing the number of claims filed by each client 
are X and Y. X and Y are independent, and each has the same probabil- 
ity distribution. 


z [9/12] v | 
We can find the distribution for S = X + Y using Equation 11.2. 
P(S = 0) = p,(0 = p4(0-p,(0 = 5-4 = 1 
S X Y 22 4 
yz») Тусе (бу ld 1 
p (1) py(0) p, (1) + py (1) p,() 2 4+4 5 = 4 
Ps(2) = рух(0) :ру(2) + px Q): py (0) + py (1) : p, (1) 
ey Oe ИТО OR eres See КИ 
Sa кее Te 


р(3) = ру(1)- py) + py (2) py) 


|| 
Bj 
pS 
+ 
ы 
ы 
Ш 
oo— 


p, = py2)-p,2)=4-4=4 
The distribution of S is given by the following: 


shat ET 


The above calculation (based on Equation 11.2) is referred to as finding 
the convolution of the two independent random variables X and Y. We 
will return to convolutions when we look at the sum of independent 
continuous random variables. О 


11.1.4 The Sum of Continuous Random Variables 


Finding probabilities for X + Y is a bit more complicated in the con- 
tinuous case, since summation is replaced by integration. 
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Example 11.3 Let X be the sick leave hours last year and Y the 
sick leave hours this year in Example 10.9. The joint density function is 


f(z,y)=2-122- 8y, оО < 2 < 1,0 <у<1. 


Let S = X +Y be the total sick leave hours for both years. We will 
calculate the probability that S = X +Y < .50. (This is actually a 
single value of the cumulative distribution function of the random 
variable S, since P(S < .50) = Е(.50)). The points (х, y) where the 
random variable X + Y is less than or equal to .50 are in the region R in 
the x-y plane satisfying the inequalities z + y < .50, for 0 < xz < 1, 
0 € y < 1. If we integrate the density function f(z, у) over this region, 
we will find the desired probability. 


PQCCY < 30) = | | fewaedy 


The region R is shown in the following figure. 


y 


x 
We can now evaluate the double integral. 
S0 p.50—y 
Р(Х +Y < 50) = J | (2— 1.22 — .8y) dz dy 
0 Jo 
50 50—y 
E (Ох — .6z? — Bry) dy 
0 z=0 
50 | 
= (.2y? — 1.8y + .85)dy = .20833 o 


0 


Example 11.3 required a fair amount of work to find a single value 
of Fs(s). However, the pattern of the last calculation will apply to the 
task of finding Fs(s) for 0 < s € 1. The region of integration changes to 
require a different integral for Fs(s) for 1 « s < 2. This reasoning is 
developed in Exercise 11-4. 
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11.1.5 The Sum of Independent Continuous Random Variables 


In the preceding example, the two random variables X and Y were not 
independent. In many applications, the random variables which are being 
added are independent. Fortunately, calculations are simpler if X and Y 
are independent. The simplification results from the use of a convolution 
rule. For two independent discrete random variables, the convolution 
rule was 


p(s) = M py) py(s — a). 


The same reasoning with summation replaced by integration leads to the 
continuous convolution principle. 


Density Function for S = X +Y 
(X and Y Independent) 


fs = | Реда 


Example 11.4 In Example 10.10, we looked at the waiting times 
S and T between accidents in two towns. For notational simplicity, we 
will use the variable names X and Y instead of S and T in this example. 
The probability density function and marginal density functions are 


f(z,y) = e €*9,forz > 0,y > 0, 


fx(z) = e?,forz > 0, 
and 


fv(y) = e", for y = 0. 


In Example 10.27, we showed that X and Y are independent. Thus we 
can use Equation 11.3 to find the density function of 5 = X + Y. 


Oe / PORE О | ete Oda 


ej ldr = se^? 
0 
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Note the limits on the second integral above. The random variables X, 
Y, and S are all non-negative. Thus z > 0, y—s-—2 20, and 
s2rz20. 

The two independent random variables X and Y were exponential 
with parameter 2 = 1. The sum S = X + У is a gamma random 
variable with parameters а = 2 and f = 1. In Section 8.3.3 we stated 
(without proof) that the sum of n independent exponential random 
variables with parameter has a gamma distribution with parameters 
о = n and B. We have just derived a special case of that result. o 


The distribution of X + Y could also be found by evaluating the 
cumulative probability P(S < s) = Fs(s) as a double integral. 


Р(Х +Ү < з) = | f fondato 


The reader is asked to do this in Exercise 11-5. The convolution 
approach is simpler, and is widely used. The reader should be aware that 
in some examples the limits of integration in Equation 11.3 become 
tricky. In the following sections, we will look at even simpler ways to 
obtain information about X + Y. 


11.1.6 The Minimum of Two Independent Exponential 
Random Variables 


For most of this section we have concentrated on the function 
g( X, Y) = X + Y. To illustrate that distribution functions can be found 
for other functions of X and Y, we will now look at the minimum 
function min(X,Y) for independent exponential random variables X 
and Y. We first need to review basic properties of the exponential 
random variable. An exponential random variable X with parameter б 
has the following cumulative and survival functions: 


F(t) = P(X <t)=1-e% 


S(t) = P(X > t) = e£ 
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Suppose that X and Y are exponential with parameters 8 and A, 
respectively, and let M denote the random variable min( X, Y). We will 
find the survival function for M. 


Su(t) = P(min(X,Y)5t) = P(X > tandY >t) 


P(X >t) P(Y >t) 


independence 


— ebie àt — e- (Xt 


The function e +" is the survival function S(t) for an exponential 
distribution with parameter 6+. Thus M must have that distribution. 


Minimum of Independent Exponential Random Variables 
CX and Y Independent with Parameters ( and A) 


M = min(X,Y) is exponential with parameter 3+. 


Example 11.5 We return to X and Y, the independent waiting 
times for accidents in Example 11.4. X and Y have exponential 
distributions with parameters В = 1 and A = 1, respectively. Then 
M = min(X,Y) has an exponential distribution with parameter 
B + ХА = 2. This can be interpreted in a natural way. In each of two 
separate towns, we are waiting for the first accident in a process where 
the average number of accidents is 1 per month. When we study the 
accidents for both towns, we are waiting for the first accident in a 
process where the average number of accidents is a total of 2 per month. 

O 


11.1.7 The Minimum and Maximum of any Two Independent 
Random Variables 


Suppose that X and Y are two independent random variables. 
Recall that the survival function of a random variable X is defined 


by 
Sy(t) = P(X >T) = 1- Fx(t) 
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The general reasoning for analyzing Min = min( X « Y) follows the argu- 


ment we used for the minimum of two independent exponential random 
variables. 


Smin(t) = P(min(X,Y) »t) = P(X »5t&Y »1) 
Nur P(X >t): P(Y >t) = Sy (t)Sy(t) 
independence 


The method of analysis for Max = max( X,Y) is very similar. 


Fyax(t) = P(max(X,Y)<t) = Р(Х «t &Y <t) 
=  P(XzpD-P(Y xt) = Fy OF) 


independence 


The next example shows that once we use the previous identities to get 
Fyax(t) or 5м, (t),, we can find density functions and expected values 
for the maximum and the minimum. 


Example 11.6 For a uniform random variable X on [0,100], 


л S. pu X _ 100-x 
Fx (х)=тор and Sx) = 1-795 = 100 


Suppose X and Y are independent uniform random variables on [0,100]. 
Then 


Suis (f) = P(min(X,Y) » t) = Sy@Sy() = wee 
Fuin® = 1-19 990 
Frytax(t) = Р(тах(Х,Ү)<1) = Fy()Fy(t) = nos 


Taking derivatives, we can find the density functions for 
min(X,Y) and max(X,Y) 


2(100-r)  100-; 
10,000 ~ 5,000 Mart) = 3:000 


fuin(t) = — 
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100 
— "100577. 10072 £g = 
E[min(x,¥)] = f $000 7. * 70.000 15,000, = 3333 
100 
100 t "E a 
E[max(X, Y)] = [ 1700“ = i = 66.66 Г] 


0 


This method can easily be extended to more than two independ- 
ent random variables, as the next example shows. 


Example 11.7 Let Х,У апа? be three independent exponential 
random variables with mean 100. Find Р(тах( х, Ү,2)<50). 


Solution Each of the random variables has density function and 
cumulative distribution function 


Р(х) = (б) = 019 F(x) = 169 


Using the same reasoning used for two random variables, we see that 


P(max(X,Y,Z)x50) = P(X <50&Ү «50& Z «50) 
= o (50) Fr (50) Fz (50) 


independence 
= (1 — e 01(50))3 


= .061 LI 


11.2 Expected Values of Functions of Random Variables 
11.2.1 Finding E[g(X,Y)] 


We have seen that finding the distribution of &(X,Y) can require a fair 
amount of work for a function as simple as g(X,Y)- X4y. However, 
the expected value of &(X,Y) can be found without first finding the 
distribution of g(X, Y). This is due to the following theorem which is 
stated without proof. 
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Theorem 11.1 Let X and Y be random variables and let g(x,y) 
be a function of two variables. 


(a) If X and Y are discrete with joint probability function p(x, y), 
FIX, Y)] = XOY об, 9): p(x, у). 
Toy 
(b) If X and Y are continuous with joint density f(z, y), 


Elg(X,Y)] = / / Кау, 


11.2.2 Finding E(X + Y) 


We will begin with an example to illustrate the application of the 
preceding theorem with 9(z,y) = = + у. 


Example 11.8 We return to the two asset random variables X and 
Y in Example 10.1. 


The theorem says that 
Е(Х +Ү)= у у (e+ y) р(т,у) 
т y 


= (0+90)(.05) + (0+100)(.27) + (0+110)(.18) 
+ (10+90)(.15) + (10+100)(.33) + (10+110)(.02) 
= 105. 
We were not required to find the probability function for 5 = X + Y. 
The theorem allows us to work directly with the joint distribution 


function. We can check our answer here, since we have already found 
the probability function for 5. 


[^s [39 [190 [116 [129] 


Then E(S) = 90(.05) + 100(.42) + 110(.51) + 120(.02) = 105. o 
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A very useful result becomes apparent if we look at the random 
variables X and Y in the last example separately. We have previously 
shown that E(X) = 100 and Е(У) = 5. Thus 

105 = E(X + Y) = E(X) + E(Y). 
This useful result always holds. If X and Y are discrete, 


E(X + Y) = у у x y): px. 9) 
roy 
УУ т=р(т,и)+ у у у р(т,у) 
r у y. m 
Уур, у) + у уу Р(х,у) 
r y y T 


= Sor: py(2) + My py) 
т y 


= E(X) + EY). 


A similar proof is used for continuous random variables, with summa- 
tion replaced by integration. This is left for Exercise 11-9. 


Expected Value of a Sum of Two Random Variables 


E(X +Y) = E(X) + EY) (11.4) 


Example 11.9 Let X be the sick leave hours last year and Y the 
sick leave hours this year from Example 10.9. We have shown in 
Example 10.13 that E(X) = .40 and E(Y) = .43. Then 

E(X + Y) = 404 43 = .83. o 
11.2.3 The Expected Value of XY 


We have just shown that the expected value of a sum is the sum of the 
expected values. Products of random variables are not so simple; the 
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expected value of XY does not always equal the product of the expected 
values. This is shown in the next example. 


Example 11.10 We return again to the two asset random variables 
X and Y in Example 10.1. 


Using the expected value theorem with g(x, y) = ту, 
E(XY) = 3 (2y): pz. y). 
т y 


= (0 - 90)(.05) + (0 - 100)(.27) + (0 - 110)(.18) 
+ (10 - 90)(.15) + (10 - 100)(.33) + (10 - 110)(.02) 
= 487. 


Note that 
E(X)- E(Y) = 100(5) = 500. 


In this case, E( XY) 4 E(X)- E(Y). О 


In the special case where Х апа У аге independent, it is true that 
E(XY) = ECX)- E(Y). If X and Y are discrete and independent, 


E(X): EY) = (> . se) e» ^о) 
T y 
= 5 S ty py) potu) 
т y 
ES TEX 
T y 


= E(XY). 


A similar proof applies for independent continuous random variables. 
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Expected Value of XY 
(X and Y Independent) 


E(XY) = E(X)- E(Y) 


Note: a) The identity in (11.5) may fail to hold if X and Y are not 
independent. b) There are examples of random variables X and Y which 
are not independent but satisfy (11.5). See problem 11-19. 


Example 11.11 The random variables X and Y in Example 11.2 
represented the number of claims filed by two insured clients. X and Y 
were independent, and each had the same probability distribution. 


Each random variable also had the same expected value. 
EQO = 0(3) +1(4) +2(4) = 3 = 8» 
By Equation (11.5), 
E(XY) = E(X)- EY) = (3) (3) = 2 o 
(XY) = E(X) EY) = (3) (3) = S 


In Exercise 11-10, the reader is asked to find E(XY ) directly and 
verify the last answer. 


Example 11.12 X and Y, the waiting times for accidents in 
Example 11.4, were independent exponential distributions with para- 
meters 8 = 1 and A = 1. 


Е(Х) = р=1= {= EQ) 
Ву Equation (11.5), 
E(XY) = E(X)- E(Y) = 1. О 
It is important to be able to calculate E(.XY ) directly when X and 
Y are not known to be independent. We have already done this for the 


discrete case in Example 11.10. The following example illustrates the 
calculation for the continuous case. 
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Example 11.13 Let X be the sick leave hours last year and Y the 
sick leave hours this year from Example 10.9. The joint density function 
is 


f(z,y) = 2 – 1.22 — 8y, forO<r<1d<y<tl. 


We will calculate E(XY) by integration, using part (b) of Theorem 11.1. 


1 pl 
Е(ХҮ) = i n ту(2 — 1.22 — 8y)dxdy 
о Jo 
1 1 
= f (х2у — Ах?%у— Ах” у”) 04У 
0 r= 


1 
= n (—.4y? + 6y)dy = 1 
0 


The reader should note that E(X) - E(Y) = .4(.43) = .173 4 E(XY). 
o 


11.2.4 The Covariance of X and Y 
The covariance is an extremely useful expected value with many appli- 
cations. It is a key component of the formula for V(X + Y), and it is 


used in measuring association between random variables. 


Definition 11.1 Let X and Y be random variables. The covari- 
ance of X and Y is defined by 


CowX, Y) = El(X — uyXY — иу). 
Example 11.14 For the two asset random variables X and Y in 


Example 10.1, E(X) = wy = 100 and E(Y) = py, = 5. The joint distri- 
bution table is as follows: 
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We will calculate Cov(X, Y) directly from the definition. 


CowX,Y) = EX — ny MY — ny)] 


= (90—100)(0—5)(.05) 
+ (100—100)(0-5)(.27) 


1090-1 100)(10—5)(.33) 
+ (110—100)(10—5)(.02) 


= 50(.05) 
+ 0(.27) 
+ —50(.18) 
+ —50(.15) 
+ 0(.33) 
+ 50(.02) 


+0 
T9 
TES. 
+0 
+1 


= —13 L1 


The sign of the covariance is determined by the relationship 
between the random variables X and Y. In our example above, the 
random variables X and Y are said to be negatively associated, since 
higher values of X tend to occur simultaneously with lower values of Y. 
The covariance was negative for these negatively associated random 
variables because the negative terms in the covariance had more 
influence on the sum than the positive terms. (The negative terms are 
shaded for emphasis.) Note that an individual term in the covariance is 
negative when (x — pty) and (y — Hy) are of opposite sign and positive 
when (x — py) and (y — дү) have the same sign. Thus the negative 
terms occur when the realized value of X is above the mean and the 
value of Y is simultaneously below the mean or vice versa, 1.e., when 
higher values of X are paired with lower values of Y or vice versa. 

Paired random variables such as the height and weight of an 
individual are said to be positively associated, because higher values of 
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both tend to occur for the same individuals and lower values do the 
same. For positively associated random variables, the covariance will be 
positive. The study of measures of association is really a topic for a 
statistics course, but it is useful to have some idea of the meaning that is 
attached to the covariance in this course. Positive covariance implies 
some positive association, and negative covariance implies some nega- 
tive association. 

We calculated the covariance directly from the definition in the 
last example in order to give an intuitive interpretation. There is another 
way to calculate the covariance. 


CowX,Y) = EX — py MY — ny) 
= E(XY — py X — wyY + uy: ny) 
= E(XY) - py - E(X) - uy  E(Y) + ux i by 
= E(XY) — Hy ` uy 


Alternative Calculation of Covariance 


CowX,Y) = E(XY) — E(X)- E(Y) (11.6) 


Example 11.15 For the two asset random variables X and Y in 
Example 10.1, E(X) = py = 100 and E(Y) = py = 5. In Example 
11.10 we showed that E(XY) = 487. Then Equation (11.6) shows that 


CowX,Y) = E(XY) - E(X)- E(Y) = 487 — (1005) = —13. 


Example 11.16 Let X be the sick leave hours last year and Y the 
sick leave hours this year from Example 10.9. In Example 11.13 we 


showed that E(XY)- n and E(X). E(Y) = .173. Then Equation 
(11.6) shows that 


CowX,Y) = .166— .173. = —.0066. m 


We know from Equation (11.5) that when X and Y are indepen- 
dent, E(XY) = E(X)- E(Y). This means that Cov(X, Y) will be zero. 
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Covariance of XY 
(X and Y Independent) 


Cov(xX, Y) = 0 


Example 11.17 Х and Y, the waiting times for accidents in 
Example 11.4, were independent exponential distributions with para- 
meters 8 = 1 and А = 1. Then Cov(X,Y) = 0. О 
11.2.5 The Variance of X + У 


The covariance is of special interest because it can be used in a simple 
formula for the variance of the sum of two random variables. 


Variance of X + Y 


V(X +Y) = V(X)4EV(Y)42-CowX,Y) (117) 


The derivation is straightforward. 
V(X +Y) = EX + Y] - (E(X +Y)? 
= E(X? +2XY + Y?) - (n, + д)? 
= E(X*) + 2E(XY) + EY?) - (их. 2n. ` ny ny): 
= E(X?) - ру + EY’) ~ ру + XE(XY)- ny цу) 
= V(O--V(Y)-2:Cow(X,Y) 


The calculations in our previous examples will now enable us to 
calculate V(X + Y) without finding the distribution of X + У. 


Example 11.18 The joint probability function for the two assets 
in Examples 10.1 and 10.3 is given below (with marginals included). 


.50 


.50 
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We have already found that E(X) = 100, V(X) = 40, E(Y) = 5 and 
V(Y) = 25. In Example 11.15, we found that Cov(X, Y) = —13. Thus 


V(X + У) = V(X) 4+ VY) -2-Cow(X,Y) = 40 + 25 — 2(13) = 39. 
We can proceed in the same way if X and Y are continuous. O 


Example 11.19 Let X be the sick leave hours last year and Y the 
sick leave hours this year from Example 10.9. The joint density and 
marginal density functions are 


f(z, y) = 2 ~ 1.22 — 8y, fr0<r<1,0<y<l, 


fx(z) = 1.6 — 1.22, for0 < x < 1, 
and 
fv(y) = 1.4 — 0.8y, foro € y < 1. 


We have already found that E(X) = .40 and E(Y) = .43. Using the 
marginal density functions, 


1 
Е(Х?) = ] z^(1.6 — 1.22) dz = .233, 
0 


d 
E(Y?) = / y^ (1.4 — 0.8y) dy = .266, 
0 


V(X) = 233 — 40? = .0733, 
and 
V(Y) = 266 — 433 = .0788. 


In Example 11.16, we found that Cov( X, Y) = — 0066. Thus 
V(X -Y)2 V(X) -V(Y) -2- Cow X, Y) = 1388. [1 
In the special case where the random variables X and Y are inde- 


pendent, Cov(.X, Y) — 0. This leads to a nice result for independent 
random variables. 


Variance of X + Y 
(X and Y Independent) 


V(X +Y) = У(Х) -V(Y) 
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Example 11.20 X and Y, the waiting times for accidents in 
Example 11.4, were independent exponential distributions with para- 
meters 8 = l and А = 1. Then 


vx)j2l-21- J =V(Y). 
Equation (11.8) shows that 

V(X--Y)-2V(X)-V(Y) = 2. m 
11.2.6 Useful Properties of Covariance 


The covariance has a number of useful properties. Five of these are 
given below with derivations. 


(1) Cov(X,Y) = CowY, X) 
E[CX — ux MY — ру) = EKY — ру) — By] 
Q) Cov(X,X) = V(X) 
CowX, X) = EX — wy A(X — и) 
= E(X - uy] 
= V(X) 
(3) If kis a constant random variable, then Cov(X, k) = 0. 


Since k is constant, E(k) = k. Then 
CowX,k) = E[(X — pk — k)] = E[0] = 0. 


(4) Cow(aX,bY)- ab- Cow X,Y) 
Since E(aX) = a-py and E(bY) = b - py, then 
Cov(aX, bY) = E[(aX —a-py)(bY — b: py)] 


= ab. E(X — ux XY — py)] 
= ab - Cov( X, Y). 
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(5) CowX,Y + Z) = Cov(X, Y) + Соух, Z) 
Since E(Y + Z) = E(Y) + E(Z) = ny + pz, then 


CowX, Y Z) = EX — n4XY +Z — (uy +H) 
= E[(X — ny (Y — ny) + (Z = n))] 


= EX — ux XY — ny)] 
+ EX — uxXZ — и) 


= Cov(X, Y) + CowX,Z). 
11.2.7 The Correlation Coefficient 


The correlation coefficient is used in statistics to measure the level of 
association between two random variables X and Y. A detailed analysis 
of the correlation coefficient and its properties can be found in any 
mathematical statistics text. The correlation coefficient is defined using 
the covariance. We have already observed that the sign of the covariance 
is detemined by the association between X and Y. 


Definition 11.2 Let Х and Y be random variables. The correla- 
tion coefficient between X and Y is defined by 
_ СоҳхХ,Ү)  CowX,Y) 

©хбү JVGO-V(Y) 


Pxy 


Although we will not prove all of the properties of p „ү discussed in this 
section, it is a simple matter to derive the value of pyy when X and Y 
are linearly related, i.e., Y = aX +b. 


= Сод Х,аХ +b) _ Cov(X,aX) + CowX, b) 


Pxy € xO sx b e x (jale y) 


= 000 a |. а> 0 
fale, — € \-1 a <0 


Thus when X and Y are linearly related, the correlation coefficient is 1 
when the slope of the straight line is positive, and —1 when the slope is 
negative. The following properties can also be shown. 
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(a) If Py =1, then Y = aX +b with a>0. 
(b) If pyy 2-1, then Y = aX +b with a<0.’ 


Thus we can simply look at the correlation coefficient and determine that 
there is a positive linear relationship between X and Y if pyy =lora 
negative relationship between X and Y if pyy = -1. 


To see what might happen when X and Y are not linearly related, 
we will look at the extreme case in which X and Y are independent and 
have no systematic relationship. When X and Y are independent, then 
Cov(X,Y)=0. Thus 


| Cov(X,Y) " 0 


OyOy independence O y Oy 


P xy = 0. 


Clearly py, =0 whenever Cov(X,Y)-0. (There are examples of 


random variables X and Y which are not independent but still satisfy 
Cov(.X,Y) = 0. Опе is given in Exercise 11-19.) 


It can be shown that 


for any random variables X and Y. We display the possible values of 
p xy and their verbal interpretations on the following diagram. 


poteet sen pese ee ec se Reese] 


-1 0 I 
Negative No Positive 
linear linear linear 
relationship relationship relationship 
Y=aX+b,a<0 Y=aX+b,a>0 


The possible values of pyy lie on a continuum between -1 and 1. 
Values of py, close to +1 are interpreted as an indication of a high 
level of linear association between X and Y. Values of pyy near 0 are 
interpreted as implying little or no linear relationship between X and Y. 


In the following examples, we will find pyy for random variables 
presented earlier in this chapter. 


? More advanced texts would say that Y = aX + b with probability 1. This is done to 
include more complicated random variables which are beyond the scope of this text. 
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Example 11.21 Let X and Y be the two asset random variables 
defined in Example 10.1. We have shown that V(X) = 40, V(Y) = 25 
and Cov(X, Y) = —13. 


243 
= — = = —- 411 0 
Рхү ^ 40025) 


Example 11.22 Let X and Y be the sick leave hour random 
variables defined in Example 10.9. We have shown that V(X) = 073, 
V(Y) = .078, and Cov(X, Y) = —.0066. 


—,0066 
Pxy = 7 == 5 — .088 О 
.073(.078) 


Although both of the correlation coefficients above are closer to 0 
than to 1, the implied association, however small, may be of some use. We 
have already noted that the relationship between the two assets X and Y 
may be useful in reducing risk. In practical situations, the interpretation of 
the correlation coefficient can be subtle. As we have mentioned previously, 
this is discussed more extensively in statistics texts. 


11.2.8 The Bivariate Normal Distribution 


There is a multivariate analogue of the normal distribution. This is im- 
portant in advanced statistics, and we will briefly illustrate it by looking 
at the two variable multivariate normal distribution.The density function 
looks complicated at first glance. Two random variables X and Y have a 
bivariate normal distribution if their join density is of the form 


ro p aa a 


210102 /1— р? 
X and Y are also referred to as jointly normally distributed. 


We will not look at the bivariate normal in depth, but it is nice 
to note here that: 
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a) The marginal distribution of X is normal with mean p and 
standard deviation сі. 

b) The marginal distribution of Y is normal with mean ju and 
standard deviation o». 

c) The correlation coefficient between X and Y is p. 


11.3 Moment Generating Functions for Sums of 
Independent Random Variables; 
Joint Moment Generating Functions 


11.3.1 The General Principle 


If X and Y are independent random variables, we can conclude that the 
random variables е'^ and e'Y used in the definition of the moment 
generating function are also independent. This gives a nice simplifica- 
tion for the moment generating function of X + Y. 


My,y(t) = Е(е“Х+Ү)) =: Е(е!Х ей) 
= Е(еѓХ). Ele”) = Mx(t)- Myt) 


independence 


Moment Generating Function of X --Y 
(X and Y Independent) 


Mx.4y(t) = Mx(t)- My(t) 


This leads to a number of nice results about sums of random variables. 


11.3.2 The Sum of Independent Poisson Random Variables 


The moment generating function of a Poisson random variable X with 
parameter А is 
Mx(t) = de), 


If Y is Poisson with parameter 8 and Y is independent of X, the 
moment generating function of X + Y is given by 


344 Chapter 11 


Myiy(t) = Mx(t)- My(t) = eX€-D . ебе) — At Ae!) 


The final expression 1s the moment generating function of a Poisson 
random variable with parameter (А + fj). 


If X and Y are independent Poisson random variables with 
parameters А and J, then X + Y is Poisson with parameter (А + 8). 
Example 11.23 In Example 10.2, the joint probability function 


and marginal probability functions for X and Y (the numbers of acci- 
dents in two towns) were 


—2 
p(x, у) = т; fort = 0,1,2,... and y = 0,1,2,..., 


and 


In this case, p(x, y) = py (x): p(y) and X and Y are independent 
Poisson random variables with А = 1. Thus X + Y is a Poisson random 
variable with À — 2. О 


11.3.3 The Sum of Independent and Identically Distributed 
Geometric Random Variables 


The moment generating function of a geometric random variable with 
success probability p is 


Mx(t) = ae 


If Y is also geometric with success probability p, then Y has the same 
distribution as X. In this case X and Y are said to be identically 
distributed. If Y is independent of X, the moment generating function 
of X + Y is given by 


2 
Mx«y() = Mx()- My) = (5 zo ) | 
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This is the moment generating function of a negative binomial distribu- 
tion with success probability p and r = 2. 


The sum of two independent and identically distributed geo- 
metric random variables with success probability p has a negative 
binomial distribution with the same p and r = 2. 


This is consistent with our interpretation of the geometric and 
negative binomial distributions. The geometric random variable repre- 
sents the number of failures before the first success in a series of 
independent trials. The sum of two independent geometric random 
variables would give the total number of failures before the second 
success which is represented by a negative binomial random variable 
with r = 2. 


11.3.4 The Sum of Independent Normal Random Variables 


The moment generating function of a normal random variable with mean 
и and variance о? is 


Mx(t) = et^. 


If Y is normal with mean v and variance т? and Y is independent of X, 
then the moment generating function of X 4- Y will be 


Mx+y(t) = Mx(t)- My(t) = е 


х 
rt 


vit ы ertt 


gd 
2 


(o? +т?у? 
"e 2 F 


The final expression is the moment generating function of a normal 
random variable with mean и-и and variance с? +т?. 


If X and Y are independent normal random variables with 
respective means и and v and respective variances с? and 7”, then 
X + Y is normal with mean и + v and variance с? + 7?. 


11.3.5 The Sum of Independent and Identically Distributed 
Exponential Random Variables 


The moment generating function of an exponential random variable with 
parameter 5 is 
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EE 
M,(t) = р 


If Y is an identically distributed exponential random variable with 
parameter 2 and Y is independent of X, the moment generating function 


of X +Y is given by 
2 
Mysy() = My (D: My(t) = (4) | 


The final expression is the moment generating function of a gamma 
random variable with parameters a = 2 and £. 


If X and Y are independent and identically distributed exponen- 
tial random variables with parameter 8, then X «Y is a gamma 


random variable with parameters o = 2 and f. 


Example 11.24 In Example 11.4 we looked at X and Y, the 
independent waiting times between accidents in two towns. X and Y 
were independent and identically distributed exponential random 
variables with  —1. In Example 11.4 we used convolutions to find the 


distribution of X + Y, and showed that X + Y was a gamma random 
variable with @=2 and 3 =1. The moment generating function result 
above confirms this conclusion without requiring the work of convolu- 
tion integrals. 


It is very important to keep in mind that these results rely upon the 
assumption of independence. The situation is much more complex when 
the random variables X and Y are not independent. 


11.3.6 Joint Moment Generating Functions 


In the one variable case, the moment generating function is defined by 
Mx(t) = E[e'*]. In the bivariate case the joint moment generating 
function for X and Y is defined similarly as 


Mx ,(s,t) = E[e***""]. 


We will illustrate this with a simple discrete example. Let the joint 
distribution for X and Y be given by the table below. 
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ара вро) | 
3 2 3 5 
6 4 Л .5 
рх(х) 6 4 
For this distribution 
My y(s,f) = E[e***''] 
= 2e?! + Зе25+31 
+.4e5*6t + 1е?5*® 


Recall that in the single variable case we can use derivatives of the 
moment generating function to find moments of X using the relationship 


M(0) = E(X"). 


In the bivariate case we can use partial derivatives of the joint moment 

generating function to get the expected values of mixed moments 

involving powers of both X and Y. The key relationship is 

Мү 
as! ar" 

We will illustrate this in our example by using the joint moment generat- 

ing function to find 


ELX/Y*] = (0,0). 


Ерхү] = 2 xr (9,9) 


0501 
wae # = 2(3)e" + 3(3)e5 + .4(6)е" 6 4. .1(6)e25*8' 
2 
д M xy = 2@%)(3)е°*?! +.3(2)(3)е?°* +! 
Osat 
+ A(0(6)e*9 + .102)(6)е2**% 
aM yy 


.2(1)(3) + .3(2)(3) + .4(1)(6) + .1(2)(6) = 6 


il 
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You can check this result by calculating ECXY) directly. 


Note that we can use the joint moment generating function to get the 
individual moment generating functions of X and Y. 


Mx y(s,0) = E(es**0Y) = E(e*) = M y(s) 
Mx y(0,t) = E(e®*"") = E(e'") = My(t) 


When X and Y are independent, the joint moment generating function is 
easy to find. 


M y y(s,t = M M y (t 
me ( bu independent xG) i ( ) 


11.4 The Sum of More Than Two Random Variables 
11.4.1 Extending the Results of Section 11.3 

The basic results of Section 11.3 can be extended for more than two 
random variables by the same technique of multiplication of moment 


generating functions. The results and some examples are given below 
without repeating the proof. 


If X,,X>2,...,X, are independent Poisson random variables with 


parameters 44,45,..,4,, then X) + X2 4 X, is Poisson with 


parameter 41,42 +*+ Ån. 


Example 11.25 A company has three independent customer service 
locations. Calls come in to the three locations at average rates of 5, 7 and 
8 per minute. The number of calls per minute at each location is a 
Poisson random variable. Then the total number of calls at all three 
locations is a Poisson random variable with А = 547-484 20. o 


The sum of n independent and identically distributed geometric 


random variables with success probability p is a negative binomial 
random variable with the same p and r =n. 
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Example 11.26 Four marksmen aim at a target. Each marksman 
hits the target with probability p = .70 on each individual shot. Indivi- 
dual shots are independent, and the marksmen are independent of each 
other. Each fires until the first hit is made. For each marksman, the 
number of misses before the first hit is a geometric random variable with 
p = .70. The total number of misses for all four is a negative binomial 
random variable with p = .70 and r = 4. mi 


If Xj, Хә, ..., X, are independent normal random variables 


with respective means и, H2, ..., Hn and respective variances ci, 


o3, ..., 02, then the sum X, + X; +- + X, is normal with mean 


ш pa ++: + Hn and variance o? + 02 + ++. +072. 


Example 11.27 Three salesmen have variable annual incomes 
with means of fifty-five thousand, seventy thousand, and one hundred 
thousand dollars per year, respectively. The variance of income is 
$10,000 for each, and the incomes are independent normal random varia- 
bles. Then the total income of the three salesmen is a normal random 
variable with a mean of и = 55,000 + 70,000 + 100,000 = $225,000 
and a variance of 3(10,000) — $30,000. D 


If Xj, X5, ..., X, are independent and identically distributed 


exponential random variables with parameter £8, then the sum 


X,+X2+---+X, is a gamma random variable with parameters 


& = n and f. 


Example 11.28 The waiting time for the next customer at a 
service station is exponential with an average waiting time of 2 minutes. 
Since E(X) = 1/8, the exponential parameter 3 is 1. Waiting times for 
successive customers are independent and identically distributed. Then 
the total waiting time for the fifth customer is a gamma random variable 
with parameters œ = 5 and 8 = 5. Г] 
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11.4.2 The Mean and Variance of X + Y + Z 


In this section we will find the mean and variance of the sum of three 
random variables. This will enable us to see the pattern of the general 
result for the sum of n random variables. The results are based on use of 
the formulas for the sum of two random variables. 


EX + (Ү+2)] = E(X) + EY +2) = E(X) + Е(Ү) + E(Z) 


V(X + (Y -Z)] = V(X) - V(Y -Z) -2- Cow X, Y À-Z) 
= V(X) + [V(Y) + V(Z) - 2- Cow(Y, Z)] 
+2: Сох, Ү) +2. Соух, 2) 


Mean and Variance of X + Y +Z 
E(X + Y4+Z) = E(X) + E(Y) + E(Z) 


V(X Y -Z) = У(Х) -V(Y) + V(Z) 
+ 2[Cow(X, Y) + CowX, Z) + CowY, Z)] 


Example 11.29 Let X, Y and Z be random variables with mean 
20 and variance 3, and Cov(X, Y) = Cow( X, Z) = CowY,Z) = I. 


E(X +Y +Z) = 20+20+20 = 60 
V(X+Y4Z) = 34343 +2[14141) = 15 Li 
The general pattern is now easy to see. The expected value of a 
sum of random variables is the sum of their expected values. The 


variance of a sum of random variables is the sum of their variances plus 
twice the sum of their covariances. 


Mean and Variance of X, + X2+---+ X, 


«(x = 2, E(X) 


V ($x) = 5 V(X) + 28 Соц, Ху) 
i=l i=1 


i i<j 
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If all the random variables X,, X5, ..., X, are independent, then 
all covariance terms are 0. Then the variance of the sum is the sum of the 
variances. 


x) independence DV 


11.4.3 The Sum of a Large Number of Independent and 
Identically Distributed Random Variables 


In Section 8.4.4, we looked at an insurance company which had 1000 
policies. The company was willing to assume that all of the policies 
were independent, and that each policy loss amount had the same (non- 
normal) distribution with 


Е(Х) = 1000 апа V(X) = 309900. 


Then the company was really responsible for 1000 random variables, 
Ху, X2, ..., X1000. The total claim loss 5 for the company was the sum 
of the losses on all the individual policies, 5 = X, + X» +- + X009. 
S was shown to be approximately normal (even though the individual 
policies X; were not) using the Central Limit Theorem. 


Central Limit Theorem Let X;, X», ..., X, be independent 
random variables all of which have the same probability distribution and 
thus the same mean p and variance c?. If n is large, the sum 


S = Xi Xo X, 
will be approximately normal with mean ny and variance no?. 
This theorem was stated without proof. The mean and variance of 
S can now be derived. 


Eje EQUO d es OD 
= ВХ) + E(X2) +. + E(X4) 
cm 

VG) 


V(X +X% +-+ Xa) 
VOX) V(X?) + -VCX4) 


independence 


2 
= no 
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This result enabled us to see that for the insurance company 


E(S) = 1000 . 1000 
and 
V(S) = 1000. 
It is more difficult to show that S must be normal, and we will not prove 
that here. (One way to prove normality is based on moment generating 
functions.) However, it is important to remember the result for applica- 
tion. In many practical examples, the random variable being considered 
is the sum of a large number of independent random variables and 
probabilities can be easily found as they were in Section 8.4.4. 


500,000 
ys а 


11.5 Double Expectation Theorems 

11.5.1 Conditional Expectations 

In this section we will return to the conditional expectations which were 
discussed in Section 10.3.3. We will use the joint probability function 


for two assets as our key example. 


Example 11.30 The joint distribution of two assets was given 
with its Е distributions in Example 10.3. 


In Example 10.7 we found that E(X) = 100 and E(Y) = 5. In Example 
10.16, we calculated the conditional distribution for X given the 
information that Y = 0 by dividing each element of the top row of the 
preceding table by py(0) = .50. This gave us the conditional distribu- 


tion. 
(s [90 [10 [119 
x) 
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The conditional distribution was used to find the conditional expecta- 
tion. 
E(X|Y = 0) = 90(.10) + 100(.54) + 110(.36) = 102.60 


We can repeat these steps to find the conditional distribution of X and 
the conditional expected value of X given that Y — 10. 
[s [o | 100 | 110, 
relo 


E(X|Y = 10) = 90(.30) + 100(.66) + 110(.04) = 97.4 


Up to this point, all of the material in this example has been review 
work. The new insight in this example comes from the observation that 
the two conditional expectations we have just calculated are values of a 
new random variable which depends on Y. We might see this more 
clearly if we create a probability table. 


C3 [3 ре 


E(X|Y = y) | 102.6 | 97.4 


The numerical quantity E(X|Y — y) depends on the chance event that 
either Y — 0 or Y — 10 occurs. We can find the expected value of this 
new random variable in the usual way. 


Е[Е(Х|У)] = .50(102.6) + .50(97.4) = 100 = E(X) 


The above equality holds for any two random variables X and Y. О 


Double Expectation Theorem for Expected Value 


E[ECX|Y)] = E(X) 


E[E(Y|X)] = EY) 


We will not give a proof. The reader will be asked to verify that 
E[E(Y|X)] = E(Y) for the two asset example in Exercise 11-26. The 
identity 1s very useful in applications in which only conditional expecta- 
tions are given. 
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Example 11.31 The probability that a claim 1s filed on an insur- 
ance policy is .10. Only one claim may be filed. When a claim is filed, 
the expected claim amount is $1000. (Claim amounts may vary.) A 
policyholder is picked at random. Find the expected amount of claim 
paid to that policyholder. 

Solution Note that the expected amount paid to the randomly 
selected policyholder is not $1000; only 10% of the policyholders 
actually file claims. To solve this problem we need to identify random 
variables X and Y for the double expectation theorem. First, let Y be the 
number of claims filed by a policyholder. The probability function of Y 
is shown in the following table: 


[v [от] 
| py) J 90 | -10 | 


Let X be the amount of claim paid. We are not given the joint distribu- 
tion of X and У, but we are given (in words) the value of E(X|Y = 1). 
It is the expected amount of $1000 paid if a claim is filed. If no claim is 
filed, the amount paid is $0, so that is the value of E(X|Y = 0). Thus 


E(X|Y = 0) = 0 and E(X|Y = 1) = 1000. 
The average claim amount paid to any policyholder is 
E[E(X|Y)] = .90(0) + .10(1000) = 100 = E(X). Li 
11.5.2 Conditional Variances 


Since the expected value of X is the expected value of the conditional 
means E(X|Y), the reader might expect the variance of X to be the 
expected value of conditional variances. However, the situation is a bit 
more complicated. We will illustrate it by continuing our analysis of the 
two asset distribution. 


Example 11.32 In Example 10.7 we found that V(X) = 40 and 
V(Y) = 25. To find conditional variances for X, we will first find 
E(X?|Y = y) and use the identity 


V(X|Y = у) = E(X?]Y = y) - (E(X|Y = y)’. 
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When Y = 0, we have the following conditional distribution: 


[ s [9 | 100 [110 
р(х|0) | .10 36 


Then E(X?|Y = 0) = 90°(.10) + 100°(.54) + 110°(.36) = 10,566 and 
V(X|Y = 0) = 10,566 — 102.6? = 39.24. When У = 10 , we have the 
following conditional distribution: 


[= opoo 
eno | 30 | 66 | 04 | 


Then E(X?|Y = 10) = 902(.30) + 1007(.66) + 1107(.04) = 9514 and 
V(X|Y = 10) = 9514 — 97.4? = 27.24. The conditional variance V(X|Y) 
is also a random variable. A probability table for it is given below. 


ae oui 


V(X|Y = y) | 3924 | 2724 


We can find the expected value of V(X|Y) from the information in the 
table. 


E(V(X|Y)] = 39.24(.50) + 27.24(.50) = 33.24 


Note that E[V(X|Y)] does not equal the value of V(X) = 40. It is short 
by an amount of 40 — 33.24 = 6.76. However, we can account for the 
remaining 6.76. It is the variance of the values of the random variable 
E(X|Y). We repeat the table for this random variable below. 


[tu cp wp 


E(X|Y = y) | 102.6 | 97.4 


The expected value of E(X|Y) was и = 100. Then the variance of 
E(X|Y)is 


VLE(X|Y)] = (102.6—100)?(.50) + (97.4—100)?(.50) = 6.76. 
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Now we have two expressions whose sum is the variance of X. 
V(X) = 40 = 33.24 + 6.76 = E[V(X|Y)] + V[IECXIY)] 
This identity always holds. m 


Double Expectation Theorem for Variance 


V(X) = E[V(X|Y)] + VLE(XY)] 


VY) = E[VQ'|X)] + Ү[Е(Ү|Х)] 


We will not give a proof of this identity. The reader will be asked to 
verify that V(Y) = E[V(Y | X)] + VLE(Y |X)] for the two asset exam- 
ple in Exercise 11-30. As we have already seen, this identity is useful in 
situations where conditional means and variances are given without 
additional information about the distribution. 


Example 11.33 We return to the insurance Example 11.31. In that 
example we were given the information that the probability of a claim 
being filed by a policyholder is .10 and the expected amount of an 
individual claim (given that a claim is filed) is $1000. Suppose we are 
given that the variance of claim amount (given that a claim is filed) is 
$100. Find the variance of claim amount for a randomly selected policy- 
holder. 

Solution We have already identified the random variables 
involved. Y is the number of claims filed by a randomly selected 
policyholder, and X is the amount of claim paid to that policyholder. We 
have already found that E(X) = 100. To find V(X) we need to find the 
two components: (a) E[V (X|Y)] and (b) V[E(X|Y )]. 


(a) Given that a claim is filed, the variance of claim amount is 
100. Thus V(X|Y = 1) = 100. If no claim is filed, the 
claim amount is the constant 0, so V(X|Y = 0) = 0. Then 


E[V(X|Y)] = .90(0) + .10(100) = 10. 


(b) The mean of the random variable E(X|Y) is E(X) = 100. 
Thus the variance is 
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VLE(X|Y)] = (E(X|0)—100)7(.90) + (ECX |1)—100)?(.10) 
= (0—100)*(.90) + (1000—100)^(.10) 
= 1007(.90) + 9002(.10) = 90,000. 


We can now find V(X). 


V(X) = E[V(X]Y)] + УТЕХУ) 
= 10 + 90,000 = 90,010 o 


The student who has studied statistics may have seen the variance 
identity before. In the above example, the expected value E[V(X|Y)] is the 
mean of the variances within each of the two categories Y = 0 (no claim 
filed) and Y — 1 (1 claim filed). It is often referred to as the variance within 
groups. The term V[E(X]Y )] is the variance of the means of the two groups 
and is referred to as the variance between groups. 


11.6 Applying the Double Expectation Theorem; The 
Compound Poisson Distribution 


11.6.1 The Total Claim Amount for an Insurance Company: 
An Example of the Compound Poisson Distribution 


In previous chapters we have looked at insurance claims in two different 
ways. Using discrete distributions, we found the probability of the 
number of claims that might be experienced. The number of claims 
experienced is called the claim frequency. Using continuous distribu- 
tions, we found the probability of the amount of a single claim. The 
amount of a claim is called the claim severity. The insurance company's 
total experience depends on the combination of frequency and severity. 
This is illustrated in the next example. 


Example 11.34 Claims come in to an insurance office at an 
average rate of 3 per day. The number of claims in a day is a Poisson 
random variable N with mean A — 3. Claim amounts X are independent 
of N and independent of other claim amounts. All claim amounts have 
the same distribution. The i!" claim X; is uniformly distributed on the 
interval [0, 1000]. The experience in one series of five days is given in 
the next table. 
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The variable of real importance to the company is the total amount of 
claims that must be paid out. This random variable is denoted by S in 
the table above. Note that the number of claims on different days varies, 
so that the number of summands in the total varies from day to day. We 
can write total claims as 


5 =X + ХХ +... + Xy. 


5 is a sum of a random number of random variables. It is referred to as а 
compound Poisson random variable because the number of claims N 
has a Poisson distribution. О 


11.6.2 The Mean and Variance of a Compound Poisson 
Random Variable 


The double expectation theorems can be used to find the mean and 
variance of a compound Poisson distribution. We will leave the derivation 
for Section 11.6.3. First we will give the mean and variance formulas and 
show how to use them in Example 11.34. There is one notation to discuss 
first. Since the claim amounts X; are identically distributed, they are all 
copies of the same random variable X and all have the same mean E(X) 
and variance V (X). 


Compound Poisson Random Variable 
N Poisson, with parameter А 
X = Xi independent and identically distributed 
S = Xi TX) +. + Ху 


E(S) = E(N)- E(X) = А: E(X) 


V(S) = А. EX?) = AVX) + (Е(Х))] 
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Example 11.35 For the insurance company in Example 11.34, the 
number of claims N was Poisson with parameter \ = 3 = E(N). The 
claim amount X was uniform on [0, 1000]. Thus 


E(X) = 500 

and 
2 
V(X) = I9. 


The above formulas immediately show that 


E(S) = 3(500) = 1500 
and 


2 
ў{бў=зЗ |290 + зоо? = 1,000,000. 


There is a very natural intuitive interpretation for E(S). We expect an 
average of 3 claims with an average amount of 500. The expected total is 
3(500). О 


Example 11.36 А large insurance company has claims occur at a 
rate of 1000 per month. The number of claims N is assumed to be 
Poisson with A = 1000. Claim amounts X are assumed to be indepen- 
dent and identically distributed, with E(X) — 800 and V(X) — 10,000. 
Then S, the total amount of all claims in a month, has a compound 
Poisson distribution with 

E(S) — 1000(800) — 800,000 
and 
V(S) = 1000[10,000 + 8007] = 650,000,000. B 


11.6.3 Derivation of the Mean and Variance Formulas 


We will begin by looking at some conditional expectations which will 
come up in the double expectation calculation. Recall that 
S= Xi X Хк. 
Then E(S|N) can be written as a sum 
Е(5 № = E(Xi- Xo ++ Xn |) 
= E(X) + E(X) +: + Е(Ху) = М. E(X). 
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Since the claim amounts are independent, the variance of the sum is the 
sum of the variances. 
V(S[N) = V(X + X» ++ Xn |N) 
= V(X) + V(X.) +- +V(Xx)= N- V(X) 


Now we have all necessary information to use the double expectation 
theorems. 


E(S) = E[E(S|N)] = EIN - E(X)] = E(X)- E(N) = А. E(X) 
V(S) = EIV(SIN)] + VLECS|N)] 
= EIN -V(X)] + VIN - E(X)] 
= V(X): E(N) + (E(X)? - VIN) 
= \-V(X) + А. (E(X)? 
= А.Е(Х?) 


11.6.4 Finding Probabilities for the Compound Poisson S 
by a Normal Approximation 


The mean and variance formulas in the preceding sections are useful, but 
in insurance risk management it is important to be able to find probabili- 
ties for the compound Poisson S as well as the mean and variance. 
Methods for this have been developed, and the actuarial student can find 
them in Chapter 12 of Bowers et al. [2]. Those methods will not be 
covered in this text. However, there is a special case in which probabilities 
for S can be approximated by a normal distribution with the same mean 
and variance. This is the case in which the Poisson mean А is very large. 


Normal Approximation to the Compound Poisson for Large A 


If S = X, + X; +: + Xy has a compound Poisson distribu- 


tion, then the distribution of S approaches a normal distribution with 
mean А. E(X) and variance А. E(X?) as А — оо. 


We will not give a proof here. (The interested reader is referred to 
Bowers et al. [2], page 386.) The next example shows how it can be 
applied for an insurance company with a large claim rate A. 
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Example 11.37 In Example 11.36 we looked at an insurance 
company with the large claim rate А = 1000. We showed that the 
compound Poisson claim total S had mean E(S) = 800,000 and 
variance V(S) = 650,000,000. Thus the standard deviation of S is 
/ 650,000,000 ~ 25,495. Suppose the company has $850,000 available 
to pay claims and wants to know the probability that this will be enough 
to pay all claims that come in. This is the probability P(S < 850,000). 
We can find it using the normal approximation above. 


P(S < 850,000) = P( Z< 230,000 = 800.000) 


5,495 
= P(Z < 1.96) = .9750 D 


11.7 Exercises 
11.1 Distributions of Functions of Two Random Variables 


11-1. Let p(z,y) be the joint probability function of Exercise 10-1, 
and let S = X + Y. Find the probability function р, (s). 


11-2. Let f(z,y) = UG, for 0<a<1, 0<у<1. Find 
Р(Х +Y < 1). 


11-3. Let X and Y be independent random variables with marginal 
distribution functions fx(r)- 2e ?, for z 0, and 
fy(y) = 3e, for y > 0, and let S = X + У. Find fs(s). 


11-4. For the joint density function given in Example 11.3, find 
P(X +Y € 1.5). Hint: Find Р(Х + Y > 1.5) first. 


11-5. Let f(x,y) be the joint density function given in Example 11.4. 
and let S = X + Y. Use a double integral to find Fs(s), take the 
derivative of this to get fs(s), and compare with Example 11.4. 


11-6. Let X and Y be the independent random variables in Exercise 
10-6. Find P(min(X,Y) > t), for0« t < 1. Note: X and Y 
are not exponential random variables. 


11-13. 


11-14. 


11-15. 
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Expected Values of Functions of Random Variables 


For the random variables in Exercise 10-1, find E(X + Y) using 
the joint probabilities in the table. Then find E(X + Y) using 
the function p(s) found in Exercise 11-1. Show that each of 
these is equal to E(X) + E(Y ), as found in Exercise 10-3. 


Let f(x,y) = 1520), for 0 <2<1 апі 0 < у < 1, as in 
Exercise 11-2. Find Е(Х + Y) using the joint density function. 
Show that this is equal to E(X) + E(Y ). 


Prove that E(X + Y) = E(X) + E(Y) for continuous random 
variables. 


. For the random variables in Example 11.11, find E(XY ) directly. 


. For the random variables in Exercise 11-8, find (a) E(XY); 


(b) ECX) - E(Y); (c) Cow X, Y ). 


. For the random variables in Exercise 11-8, find (a) V(X); 


(b) V(Y); (OQ У(Х + Y). 
For the random variables in Exercise 10-1, find V(X + У). 


Let X and Y be random variables whose joint probability distri- 
bution and marginal distributions are given below. 


Find (a) E(X); (b) E(Y); (с) V(X); (d) VY); (e) Cov( X,Y); 
(f) У(Х + Y). 


Let X and Y be the random variables in Exercise 10-22 with 
joint density function f(z,y) = бт, for 0« z« y< 1, and 
f(x,y) = 0 elsewhere. Find (a) V(X); (b) V(Y); (c) ECXY); 
(d) V(X + Y). 
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11-16. 


11-17. 


11-18. 


11-19. 


11.3 


11-20. 


11-21. 


11-23. 


For the random variables given in Exercise 11-14, find the 
correlation coefficient. 


For the random variables given in Exercise 11-15, find the 
correlation coefficient. 


Let X and Y be random variables with joint density function 
f(z,y)-—zx-y,for 0 € rz € l and 0 € y € l, and f(z,y) = 0 
elsewhere. Find the correlation coefficient. 


Let X and Y be random variables whose joint density function 


а: 

is f(x,y) = a, for —1 < z <1 and —1 < y < 1, and 

f(z, y) = 0 elsewhere. 

(a) Find fx(x) and fy(y), and show that X and Y are not 
independent. 

(b) Find E(X), E(Y), E(XY) and Cou(X, Y). 


Moment Generating Functions for Sums of 
Independent Random Variables 


Let X and Y be independent random variables with joint proba- 
bility function f(z, y) = z(y + 19/15, for z = 1,2 and y = 1,2. 
Find Afx+y(t). 


Let X and Y be independent random variables, each uniformly 
distributed over [0, 2]. Find M x, y(t). 


The Sum of More Than Two Random Variables 


The random variable S representing the sum of n fair dice is the 
sum of n independent random variables, X;, i = 1,2,...,n, 
where X; represents the number of dots on the toss of the i'^ die. 
Find E(S) and V (S). 


Let X4, X5, Хз and X4 be random variables such that for each i, 
V(X;) = 13/162, and for i £ j, Cov(Xi, X;) = —1/81. Find 
V(X, + Х + X3 + Ха). 
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11-24. 


11-25. 


11.5 
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Let S = X, + Х +- + Хо be the sum of random variables 
such that V(S) = 500/9, V(X;) = 25/3 for each i, and all 
covariances, for i X j, are the same. Find Cov(X;, X ;). 


Let S = X; + X2 +- + Xsoo, where the X; are independent 


and identically distributed with mean .50 and variance .25. Use 
the Central Limit Theorem to find P(235 < S < 265). 


Double Expectation Theorems 


Exercises 11-26 through 11-30 refer to the random variables and distri- 
butions in Examples 11.30 and 11.32. 


11-26. 


11-27. 


11-28. 


11-29. 


11-30. 


11-31. 


Find (a) E(Y|X = 90); (b) E(Y|X = 100); (c) E(Y|X = 110). 
Find E(E(Y|X)]. 
Find (a) V(Y|X = 90); (b) V(Y |X = 100); (c) V(Y|X = 110). 
Find E[V (Y |X)]. 
Find V[E(Y |X)], and verify the identity 

E[V(Y|X)] + VLE(Y|X)] = V(Y). 
The probability that a claim is filed on an insurance policy is 
.07, and at most one claim is filed in a year. Claim amounts are 
for either $500, $1000 or $2000. Given that a claim is filed, the 
distribution of claim amounts is P(500) — .60, P(1000) — .30 
and P(2000) — .10. Find the variance of the claim amount paid 


to a randomly selected policyholder. (Recall that some policy- 
holders do not file a claim and are paid nothing.) 


Exercises 11-32 through 11-36 refer to the random variables in Exercise 
10-24, whose joint density function is f(x,y) = 6x, forO< r< y< 1, 
and f(x, y) = 0 elsewhere. 


11-32. 


Find (a) fx(z); (b) E(X); (c) V(X). 
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11-33. 


11-34. 


11-35. 


11-36. 


11.6 


11-37. 


11-38. 


Find E[E(X]Y )]. (This should be equal to E(.X).) 
Find V(X|Y = y). 
Find E[V (XlY )]. 


Find V[E(XIY )]. Verify that E[V(X|Y)] + VLECXIY)] = V(X). 


Applying the Double Expectation Theorem; 
The Compound Poisson Distribution 


The number of claims received by an insurance company in a 
month is a Poisson random variable with А = 20. The claim 
amounts are independent of each other, and each is uniformly 
distributed over [0, 500]. S is the random variable for the total 
amount of claims paid. Find (a) E(S); (b) V(S). 


Let the claim amounts in Exercise 11-37 have a lognormal distri- 
bution, whose underlying normal distribution has и == 5 and 
о = 40. Find (а) E(S); (b)V(S). 


Use the normal approximation to the compound Poisson distribution in 
Exercises 11-39 and 11-40. 


11-39. 


11-40. 


The number of claims received in a year by an insurance 
company is a Poisson random variable with А = 500. The claim 
amounts are independent and uniformly distributed over 
[0,500]. If the company has $140,000 available to pay claims, 
what is the probability that it will have enough to pay all the 
claims that come in? 


The number of claims received in a year by an insurance 
company is a Poisson random variable with A = 500. The claim 
amount distribution has mean E(X)- 600 and variance 
V(X) = 12,000. What is the minimum amount the company 
would need so that it would have a .95 probability of being able 
to pay all claims? (Use the fact that Fz(1.645) ~ .95.) 
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11-41. 


11-42. 


11.43. 
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Sample Actuarial Examination Problems 


An insurance company determines that N, the number of claims 


received in a week, is a random variable with P[N =n] =z 


where n20. The company also determines that the number of 


claims received in a given week is independent of the number of 
claims received in any other week. 


Determine the probability that exactly seven claims will be 
received during a given two-week period. 


A company agrees to accept the highest of four sealed bids on a 
property. The four bids are regarded as four independent random 
variables with common cumulative distribution function 


F(x) = 5 (l+sin zx) for 2<x<2 


Which of the following represents the expected value of the 
accepted bid? 


5/2 1,62 | 3 

(9 л |, хсозхйх (D) д^ h, cosa (Le sin zx) dx 
p ce eee ne LT А 

(В) 16 ‚› d t sinzx) dx (E) 47 |, xcoszx( + sin zx) dx 


1 5/2 ; 4 
(C) i6 Ge x(1+sin zx)" dx 


Claim amounts for wind damage to insured homes are 
independent random variables with common density function 


3 for x>l 
ГО) =5* 
0 otherwise 


where x is the amount of a claim in thousands. 


Suppose 3 such claims will be made. What is the expected value 
of the largest of the three claims? 
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11-44. 


11-45. 


11-46. 


11-47. 


An insurance company insures a large number of drivers. Let X 
be the random variable representing the company's losses under 
collision insurance, and let Y represent the company's losses 
under liability insurance. X and Y have joint density function 


4 


2х+2-у forO<x<land0<y<2 
f(x) = 
0 otherwise 


What is the probability that the total loss is at least 1? 


A family buys two policies from the same insurance company. 
Losses under the two policies are independent and have continu- 
ous uniform distributions on the interval from 0 to 10. One 
policy has a deductible of 1 and the other has a deductible of 2. 
The family experiences exactly one loss under each policy. 


Calculate the probability that the total benefit paid to the family 
does not exceed 5. 


Let Т, be the time between a car accident and reporting a claim 
to the insurance company. Let T, be the time between the report 
of the claim and payment of the claim. The joint density function 
of Д and 7, f(4,t2), is constant over the region 0 «5 <6, 
0 «t5 «6, 4 +t «10, and zero otherwise. 


Determine E[7; + 7], the expected time between a car accident 
and payment of the claim. 


Let 7; and 7 represent the lifetimes in hours of two linked 
components in an electronic device. The joint density function 
for П and 7 is uniform over the region defined by 
O<4 €t; € L where L is a positive constant. 


Determine the expected value of the sum of the squares of 7; 


and 75. 
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11-49. 


11-50. 


11-51. 
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In a small metropolitan area, annual losses due to storm, fire, and 
theft are assumed to be independent, exponentially distributed 
random variabies with respective means 1.0, 1.5, and 2.4. 


Determine the probability that the maximum of these losses 
exceeds 3. 


A company offers earthquake insurance. Annual premiums are 
modeled by an exponential random variable with mean 2. 
Annual claims are modeled by an exponential random variable 
with mean 1. Premiums and claims are independent. 


Let X denote the ratio of claims to premiums. 


What is the density function of X? 


Let X and Y be the number of hours that a randomly selected 
person watches movies and sporting events, respectively, during 
a three-month period. The following information is known about 
X and Y: 


E(X)250 Var(X)-50  E(Y)-20 
Var(Y)=30  Cov(X,Y) -10 


One hundred people are randomly selected and observed for 
these three months. Let 7 be the total number of hours that these 
one hundred people watch movies or sporting events during this 
three-month period. 


Approximate the value of P(T « 7100). 


The profit for a new product is given by Z 3X -Y—-5. X and Y 
are independent random variables with Var(X) 21 and Var(Y) 
= 2, What is the variance of Z? 
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11-52. 


11-53. 


A company has two electric generators. The time until failure for 
each generator follows an exponential distribution with mean 10. 
The company will begin using the second generator immediately 
after the first one fails. 


What is the variance of the total time that the generators produce 
electricity? 
A joint density function is given by 


_ Jk for O«x«l1, O«y«l 
Fy) = to otherwise 


where & is a constant. What is Cov(.X , Y)? 


11-54. Let X and Y be continuous random variables with joint density 


11-55. 


function 


$ ху fo 0<х<1, х<у<2х 


foy) = | 


0 otherwise 


Calculate the covariance of X and Y. 


Let X and Y denote the values of two stocks at the end of a five- 
year period. X is uniformly distributed on the interval (0,12). 


Given X = х, Y is uniformly distributed on the interval (0, х). 


Determine Cov(.X,Y) according to this model. 
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11-57. 


11-58. 
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An actuary determines that the claim size for a certain class of 
accidents is a random variable, X, with moment generating 
function 


1 
dide. t 
xe) (1-250004 


Determine the standard deviation of the claim size for this class 
of accidents. 


A company insures homes in three cities, J, K, and L. Since 
sufficient distance separates the cities, it is reasonable to assume 
that the losses occurring in these cities are independent. 


The moment generating functions for the loss distributions of the 
cities are: 


M,()-0-20?  Mxy()20-20??  Mi()2 0-20 7? 


Let X represent the combined losses from the three cities. 
Calculate E(X?) 


An insurance policy pays a total medical benefit consisting of 
two parts for each claim. 


Let X represent the part of the benefit that is paid to the surgeon, 
and let Y represent the part that is paid to the hospital. The 
variance of X 1s 5000, the variance of Y is 10,000, and the 
variance of the total benefit, X + Y, is 17,000. 


Due to increasing medical costs, the company that issues the 
policy decides to increase X by a flat amount of 100 per claim 
and to increase Y by 10% per claim. 


Calculate the variance of the total benefit after these revisions 
have been made. 
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11-59. 


11-60. 


11-61. 


11-62. 


Let X denote the size of a surgical claim and let Y denote the size 
of the associated hospital claim. An actuary is using a model in 
which E(X)=5, E(X?)-274, E(Y)-7, E(Y*)=51.4, and 
Var(X -Y) = 8. 


Let C, = X +Y denote the size of the combined claims before the 
application of a 20% surcharge on the hospital portion of the 
claim, and let C; denote the size of the combined claims after 
the application of that surcharge. 


Calculate Cov(C|, C2). 


Claims filed under auto insurance policies follow a normal distri- 
bution with mean 19,400 and standard deviation 5,000. 


What is the probability that the average of 25 randomly selected 
claims exceeds 20,000? 


A company manufactures a brand of light bulb with a lifetime in 
months that is normally distributed with mean 3 and variance 1. 
A consumer buys a number of these bulbs with the intention of 
replacing them successively as they burn out. The light bulbs 
have independent lifetimes. 


What is the smallest number of bulbs to be purchased so that the 
succession of light bulbs produces light for at least 40 months 
with probability at least 0.9772? 


An insurance company sells a one-year automobile policy with a 
deductible of 2. 


The probability that the insured will incur a loss is .05. If there is 
a loss, the probability of a loss of amount N is K/N, for 
N =1,,...,5 and К a constant. These are the only possible loss 


amounts and no more than one loss can occur. 


Determine the net premium for this policy. 
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11-63. An auto insurance company insures an automobile worth 15,000 
for one year under a policy with a 1,000 deductible. During the 
policy year there is a .04 chance of partial damage to the car and 
a .02 chance of a total loss of the car. If there is partial damage to 
the car, the amount X of damage (in thousands) follows a 
distribution with density function 


..).5003e "^? for 0<х<15 
LGN n otherwise 


What is the expected claim payment? 


Chapter 12 
Stochastic Processes 


12.1 Simulation Examples 


In many situations it is important to study a series of random events over 
time. Insurance companies accumulate a series of claims over time. 
Investors see their holdings increase or decrease over time as the stock 
market or interest rates fluctuate. These processes in which random 
events affect variables over time are called stochastic processes. In this 
section we will give a number of examples of stochastic processes. Each 
example will contain simulation results designed to give the reader an 
intuitive understanding of the process. 


12.1.1 Gambler's Ruin Problem 
We return to the gambling roots of probability for our first example. 


Example 12.1 Two gamblers, A and B, are betting on tosses of a 
fair coin. The two gamblers have four coins between them: A has 3 
coins and B has 1. On each play, one of the players tosses one of his 
coins and calls heads or tails while the coin is in the air. If his call is 
correct, he gets a coin from the other player. Otherwise, he loses his 
coin to the other player. The players continue the game until one player 
has all the coins. 

Solution Intuitively, it seems that A would be more likely to end 
up with all the coins, since A starts with more coins. We can test this 
hypothesis experimentally with a computer simulation. The probability 
that A wins on any single toss is P(H) = P(T) = .50. We can simulate 
tosses of the coin by generating a random number in [0, 1) and giving A 
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а loss if the number is in [0, .5) and a win if the number is in [.5, 1). The 
result of one simulation of the game is shown below. 


Random Number 


A has 


Play 


Begin 
1 0.007510 2 
2 0.126708 1 
3 0.614643 2 
0.621189 3 
| 5 0.913130 4 


In this game, A had two losses in a row but was able to recover with 
three wins in a row to get all 4 coins. It is less likely that A will lose, but 
that is possible. The next simulation shows a series of plays in which B 
ended up with all 4 coins and A with none. 


Random Number 


3 
0.425238 2 
0.971694 3 
0.217407 2 
0.362054 1 
0.942864 2 
0.076474 1 
0.262251 0 


Any time this game is played, one player will eventually get all of the 
coins. The process is random in any single game, but if a large number 
of such games is played, an interesting pattern emerges. We used the 
computer to play this game to completion 100 times. In that series of 
simulations, Player A won 75 times and Player B won 25 times. It 
appears that the player who starts with 75% of the coins has a 75% 
probability of winning all the money, but our simulation only tells us 
that this might be true; it does not tell us that this must be true. We 
repeated the experiment of 100 plays a number of times, and found that 
in each sequence of plays the number of wins for A was near (but not 
exactly equal to) 75. In Section 12.2 we will develop some theory to 
prove that P(A wins all coins) = .75. 
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This problem is called the gambler’s ruin problem because one 
of the gamblers will always lose all of his money. Theory can be 
developed to show that if A starts with a coins and B starts with b coins, 
then 


P(A wins all coins) = ES 


For example, when A has 10,000,000 coins and B has 200, the probabili- 
ty that A wins all of the coins and B leaves with nothing is 


10,000,000 


10.000.200 дш .99998. 


This is useful to remember when you are B entering a casino. о 
12.1.2 Fund Switching 


Example 12.2 Employees in a pension plan have their money 
invested in one of two funds which we will call Fund 0 and Fund 1. 
Each month they are allowed to switch to the other fund if they feel that 
it may perform better. For investors in Fund 0, the probability of staying 
in Fund 0 is .55 and the probability of moving to Fund 1 is .45. For 
investors in Fund 1, the probability of a switch to Fund 0 is .30 and the 
probability of staying in Fund 1 is .70. We can summarize this in the 
following table of probabilities. 


We can simulate the progress of a single employee over time as follows: 


(a) Generate a random number from (0, 1). 

(b) If the employee is in Fund 0 now, keep the employee in 
Fund 0 if the random number is in [0, .55). Otherwise switch 
the employee to Fund 1. 

(c) If the employee is in Fund 1 now, switch the employee to 
Fund 0 if the random number is in [0, .30). Otherwise keep 
the employee in Fund 1. 
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The result of one such simulation for 6 months gave the following 
results for an employee starting in Fund 1: 


Random Number 
1 


1 
2 
3 
4 
5 
6 
As with the gambler’s ruin example, there is a long-run pattern to be 
found. We simulated this process for 100 months at a time, and found 
that a typical employee was in Fund 1 approximately 60% of the time. 


We will be able to use theory in Section 12.2 to prove that this must 
happen. " 


12.1.3 A Compound Poisson Process 


The crucial process for an insurance company is to observe the frequency 
and severity of claims day by day. On each day a random number of claims 
for random amounts comes in. The company must manage the risk of its 
total claims S over time. If the number of claims N is Poisson, and the 
claim amounts X are independent of each other and of N, then S follows a 
compound Poisson distribution. We have already given a simulation 
example for such a process in Chapter 11. In Example 11.34 the number of 
claims in a day was a Poisson random variable N with mean A — 3. Claim 
amounts X were independent, as required. The i^" claim X; was uniformly 
distributed on the interval [0,1000]. The experience in one series of five 
days was the following: 


Number | 
of claims 
N 
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This is only one simulation of the process for a short number of days. 
Theory can also be used here to develop useful patterns for risk manage- 
ment, but that theory will not be studied in this text. 


12.1.4 A Continuous Process: Simulating Exponential 
Waiting Times 


All of the previous stochastic processes were recorded for discrete time 
periods. The plays or months were indexed using the positive integers 
1,2,3,.... Other stochastic processes occur in continuous time. For 
example, the exact waiting time for the next accident at an intersection 
can be any real number. The reader might recall that the waiting time T' 
for the next accident at an intersection can be modeled using an expo- 
nential random variable. This is illustrated in the next example. 


Example 12.3 The waiting time T' (in months) between accidents 
at an intersection is exponential with А = 2. We can simulate values of 
this random variable using the inverse transformation method from 
Section 9.5.2. The following table contains the result of a simulation of 
the waiting time for the next 5 accidents at the intersection. 


Е- (и) 
Time to Next 
Trial Rand om Accident Total Time 


0.391842 0.248660 


0.603216 0.462181 0.710841 
0.094226 0.049483 0.760324 
0.092443 0.048499 0.808823 


0.489792 0.336468 1.145291 


The first accident occurred at time .24866 and the second accident 
occurred .462181 time units later, at a total time of .710841. These 
results are in continuous time. L1 


The reader might note that the first 4 accidents occurred before 
one time unit (month) had been completed. Thus the random number of 
accidents in one month was N = 4 accidents. In this exponential simu- 
lation, we have simulated one value of the Poisson random variable № 
which gives the number of accidents in a month. One method for 
simulating the Poisson random variable is based on using exponential 
simulations in this way. 
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12.1.5 Simulation and Theory 


We have provided simulations here to illustrate the basic intuitions 
behind simple stochastic processes. The processes studied here could 
have been analyzed without simulation, since there are theorems to 
determine their long-term behavior. We will illustrate the theory used on 
random walks and fund switching in Section 12.2. The reader can find 
additional useful theoretical results for Poisson processes in other texts. 
However, simulation plays a very important role in modern stochastic 
analyses. The processes given here are very basic, but in many other 
practical examples the stochastic processes are so complex that exact 
theoretical results are not available and simulation is the only way to 
seek long term patterns. 


12.2 Finite Markov Chains 
12.2.1 Examples 


The first two examples in Section 12.1 were examples of finite Markov 
chains. We will return to Example 12.1 to illustrate the basic properties 
of a finite Markov chain. 


Example 12.4 In the gambler's ruin example, two gamblers bet 
on successive coin tosses. The two gamblers have exactly 4 coins 
between them. On each toss, the probability that a gambler wins or loses 
a coin is .50. The gamblers play until one has all the coins. At the end 
of each play, there are only 5 possibilities for a gambler: he may have 0, 
1, 2, 3, or 4 coins. The number of coins the gambler has is referred to as 
his state in the process. In other words, if the gambler has exactly i 
coins, he is said to be in State i. The process is called finite because the 
number of states is finite. If the gambler is in State 2, there is a .50 
probability of moving to State 3 and a .50 probability of moving to State 1. 
The probability of moving to any other state is 0, since only one coin is 
won or lost on each play. It is helpful to have a general notation for the 
probability of moving from one state to another. The probability of 
moving from State ? to State 7 on a single toss is called a transition 
probability and is written as р;;. In our example, p23 = .50, р = .50, 
Poa = 0, p2 = 0, and ри = 0. 
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The last probability is of special interest. Once you are in State 0, 
you have lost all your money and play stops. The probability of going to 
any other state is 0. In this process, the States 0 and 4 are called 
absorbing states, because once you reach them the game ends and the 
probability of leaving the state is 0. Since there are only finitely many 
states, we can display all the transition probabilities in a table. This is 
done for the gambler’s ruin process in the next table. The beginning 
states are displayed in the left column, the ending states in the first row, 
and the probabilities in the body of the table. 


ne Ending state 
Beginning state 


It is simpler to write the transition probabilities p,; in matrix form, 
without including the states. The resulting matrix is called the transition 
matrix P. For our gambler's ruin example, the transition matrix is 


gz 

І 
Фоо е 
oouoo 
Фол Філ о 
oouoo 
= ло о о 


A key feature of the gambler’s ruin process is the fact that the gambler’s 
next state depends only on his last state and not on any previous states. 
If the gambler is in State 2, he will move to State 3 on the next play with 
probability .50. This does not depend in any way on the fact that he may 
have been in State 1 or State 3 a few plays before. The probability of 
moving from State 4 to State 7 in the next play depends only on being in 
State i now, and thus can be written simply as pi;. 


In general, a finite Markov chain is a stochastic process in which 
there are only a finite number of states so, 5], $2, ..., Sn. The probability 
of moving from State 7 to State 7 in one step of the process is written as 
pij, and depends only on the present State 2, not on any prior state. The 
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matrix Р = [р;;] is the transition matrix of the process. Our next 
example is taken from the fund switching process of Example 12.2. 


Example 12.5 Members of a pension plan may invest their pension 
savings in either Fund 0 or Fund 1. There are only two states, 0 and 1. 
Each month members may switch funds if they wish. The probabilities 
of switching remain constant from month to month. The probability of 
switching from 0 to 1 is pp; = .45. The probability of switching from 1 
to 0 is pio = .30. The transition matrix for this process is 


55 A45 
E Е i 
This process is different from the gambler's ruin process. There are no 
absorbing states. It is possible to go from any state to any other. m 


The use of constant transition probabilities for fund switching may 
not be completely realistic. It is difficult to accept the assumption that 
the transition probability pj; is the same for every step of the fund- 
switching process and does not change over time. Investor behavior is 
influenced by a number of factors which may change over time. It is also 
likely that investor behavior is influenced by past history, so that the 
probability of a switch may depend on what happened two months ago 
as well as the present state. We will use this process to illustrate the 
mathematics of Markov chains in the next section, but it is important to 
remember that results will change if the probabilities pj; change over 
time instead of remaining constant. 


12.2.2 Probability Calculations for Markov Processes 


Example 12.6 Suppose the pension plan in Example 12.5 started 
at time 0 with 50% of its employees in Fund 0 and 50% of its employees 
in Fund 1. We would like to know the percent of employees in each fund 
at the end of the first month. In probability language, the probabilities of 
an employee being in Fund 0 or Fund 1 at time 0 are each .50, and we 
would like to find the probability that an employee is in either fund at 
time 1. To analyze this, we will use the notation 


p" = the probability of being in State і at time k. 
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We are given that 


р? = 50 
and 
pO = 50. 


We need to find pP and p”. We can find n using basic rules of prob- 
ability from Chapter 2. 


n = P(An employee is in Fund 0 at time 1) 


= P(The employee started in Fund 0 and did not switch) 
+ P(The employee started in Fund 1 and switched to Fund 0) 


= P(Stay in Fund 0 |Start in Fund 0) x P(Start in Fund 0) 
+ P(Switch to Fund 0 |Start in Fund 1) x P(Start in Fund 1) 


= poo: р? + po: p? 
= .55(.50) + .30(.50) = .425 
We can find р? in a similar manner. 


Pi) = Por Py” + pu Py = 45650) + .70(.50) = .575 


This sequence of calculations can be written much more simply using 
the transition matrix P. Note that 


(0) _ (0) = .55 45 
| d |Р = Eee зо) 3) 70 
= [.50(55) + .50(.30), .50(.45) + .50(.70)] 
= [425, 575] = |p, Це 


We сап calculate the probabilities of being in States 0 ог 1 at time 1 
using matrix multiplication. 1 
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In the preceding calculation, we have shown that we can use multi- 
plication by P to move from the probability distribution of funds at time 
0 to the probability distribution at time 1. 


0 0 1 1 
А рор = [50 р 


The same reasoning can be used to show that we can move from the 
distribution at any time 2 to the distribution at the next time i + 1 using 
multiplication by P. 


1 i 1+1 i+] 
[дў, p р = [М рб | 


This gives us a simple way to find the probability distribution of funds at 
any point in time. 


In general, if we are given the probability of being in each fund at time 
0, we can find the probability distribution for the two funds at time n 
using the identity 


(n n 0 0 n 
Su EI Su an 


On the following page are the first 7 powers of the transition 
matrix for fund switching, along with the distributions for the first 7 
months starting at [.50, .50]. 
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n p 
0 
| 0.5500 
0.3000 
) 0.4375 
0.3750 
3 0.4094 
0.3938 
4 0.4023 
0.3984 
5 0.4006 
0.3996 
6 0.4001 
0.3999 
7 0.4000 
0.4000 


0.4500 
0.7000 


0.5625 
0.6250 


0.5906 
0.6063 


0.5977 
0.6016 


0.5994 
0.6004 


0.5999 
0.6001 


0.6000 
0.6000 


| 
| 
| 
| 
| 
| 


| 


т 


[0.5000 


[0.4250 
[0.4063 
[0.4016 
[0.4004 
[0.4001 
[0.4000 


[0.4000 


pi] 


1 


0.5000) 


0.5750] 


0.5938] 


0.5984] 


0.5996] 


0.5999] 


0.6000] 


0.6000] 
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This calculation shows us that even though the pension plan started with 
50% of the employees in each fund, the distribution of employees 
appears to be stabilizing with 40% in Fund 0 and 60% in Fund 1. In 
Section 12.3 we will show that there will eventually be 40% of all 
employees in Fund 0 and 60% in Fund 1, no matter what the starting 


distribution is. 


The matrix multiplication procedure works for any finite Markov 


process. If the states are so, 81, 82, .. 


time £ is the row vector 
(2) 


(2) 


p? = [Do mn $e 


> Py 


]. 


., 8p, the probability distribution at 
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If P is the transition matrix for the process, then we can move from the 
probability distribution at time i to the probability distribution at time 
i + 1 using the identity 

pit) = р®р. 


The probability distribution at time n is related to the initial probability 
distribution p by the identity 


p™ = pP". 


Example 12.7 For the gambler's ruin example with 4 coins 
between the two gamblers, the transition matrix was 


“ 
il 
оо (л о о 
Ф (л О л © 
— (л о о © 


Suppose a gambler starts with 1 coin. His initial probability distribution 
at time 0 is given by the row vector 


p = [0, 1, 0, 0, 0]. 
His probability distribution at time 1 is given by 
p? = pOP = [.5, 0, .5, 0, 0]. 
We can observe what happens to this gambler in the long run by looking 
at p = p? P" for larger values of n. Such calculations are a problem 
when done by hand, but calculators such as the TI-83 will do them 


easily. Below аге the results for n = 12. The matrix P!? is given next 
with all entries rounded to three places. 
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1.000 0.000 0.000 0.000 0.000 
0.742 0.008 0.000 0.008 0.242 
0.492 0.000 0.016 0.000 0.492 
0.242 0.008 0.000 0.008 0.742 
0.000 0.000 0.000 0.000 1.000 


The probability distribution for the gambler after 12 plays is the row 
vector 


[.742, .008, .000, .008, .242]. 


We will show in Section 12.4 that the long-term probability distribution 
for a gambler starting with one out of 4 coins is (.75, 0, 0, 0, .25]. Г 


12.3 Regular Markov Processes 
12.3.1 Basic Properties 


We return to the analysis of fund switching in Example 12.6 to illustrate 
the basic properties of regular finite Markov chains. The transition 
matrix for that process was 


Note that all the entries in P are positive. A stochastic process is called 
regular if, for some n, all entries in P" are positive. Thus the fund- 
switching process above is regular with n = 1. An important consequence 
of this definition is that for a regular process it is always possible to move 
from State i to State j in exactly n steps for any choice of i and j. Note 
that the gamblers ruin process is not regular. If you have lost all your 
money and are in State 0, it is not possible to move to any other state. 

We can describe the long-term behavior of regular finite Markov 
processes by looking at the limit of P^ as n approaches infinity. We 
observed in Example 12.6 that the matrix P" rapidly approached a 
limiting matrix L. The matrices P^ and P^ were 


0.4001 0.5999 
0.3999 0.6001 


and 
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0.4000 0.6000 
0.4000 0.6000 


Note that the limiting matrix L had identical rows. It can be proved that 
this happens for any regular finite Markov chain. 


Limit of P” for a Regular Finite Markov Chain 


If P is the transition matrix of a regular finite Markov process, 
then the powers P" converge to a limiting matrix L. 


limP" —L 


n= 


The rows of L are all equal to the same row vector £. 


In our example of fund switching, the limiting matrix L was 


and the common row vector was £ = [.4 .6]. In that example, the 
distribution of employees was shown to approach £ over time. This will 
happen no matter what the distribution of employees is at time 0. If the 


initial distribution is [ ; e then the limiting distribution is 


lim [ 5| pa |n. п lim P” 
NCO n= 


= [p ii | 


0 0 0 0 
= [Anf + 4p, 6p + 6p 


= [.4, .6]. 
Note that the limiting distribution is given by the common row vector £ 


of L, and that pL = £. This, too, holds for every regular finite Markov 
chain. 
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Limiting Distribution for a Regular Finite Markov Chain 


For any regular finite Markov chain, pL = £ no matter what 
initial distribution p© is chosen. The limiting probability distribu- 
tion is given by the common row vector Z of the limiting matrix L. 


12.3.2 Finding the Limiting Matrix of a Regular Finite 
Markov Chain 


The vector Z can be found using a simple system of equations. The 
system is based on the observation that £P — Z. Intuitively, this equation 
tells us that once we have reached the limiting distribution, future steps 
of the process leave us there. A derivation of the equation ZP — £ is 
outlined in Exercise 12-12. We will use this equation to find the limiting 
distribution of the fund-switching process in the next example. 


Example 12.8 If we write the unknown vector @ for the fund- 
switching process as [z, y], the equation £ P = 2 becomes 


бий 59 2| = з 


This reduces to the following system of equations: 


li 


55x + .30y 
452 + 70у = y 


T 


This, in turn, reduces to the following linear homogeneous system: 


—.45х + .30y = 0 
45x — .30y 


I 


This system has infinitely many solutions, but we are looking for the 
solution which is a probability distribution, so that it satisfies the condition 
x+y = 1. Thus we solve the following system: 
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—.45х + .30y 
45x —.30y = 0 


arty = 1 


The solution of this system is x = 40 and y = .60. Thus we have 
demonstrated that 2 = [.40, .60]. This procedure works in general. O 


Finding the Limiting Distribution for a Regular Finite 
Markov Chain 


For any regular finite Markov chain, we can find the common 
row vector Ё = [ri, £2, ..., £n] of the limiting matrix L by solving 


the system of n+1 linear equations given by 


[zi, 22, ..., z,]P. = [zi 22,...,2n] 
and 
Titt +e +r = l. 


Example 12.9 Another pension plan gives its employees the 
choice of three funds: Fund 0, Fund 1 and Fund 2. Participants are 
permitted to change funds at the end of each month. The transition 
matrix for the fund-switching process is given by 


ou» n 
P-1.3 .6 . 
i2. 5. 


Then the limiting distribution £ = [z, y, 2] can be found by solving the 
following system: 


E NE 

[z, у, 2]] 3 6 1] = [m y, z] 
33. эў, 25 
ztytz=1 


This leads to the following system of equations: 
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—.8х + 3y + 2z = 0 
Sy — 4y + 32 


ПИ 
о © 


Зх + Лу = .52 
їт+у + = 


The solution is x = .25, у = .50 апа z = 25. In the long run, the 
pension plan will have 25% of employees in Fund 0, 50% in Fund 1, and 
25% in Fund 2. This solution can be checked by evaluating powers of 
the transition matrix. The TI-83 (with rounding set to three places) gives 


250 .500 .250 
Ре = |.250 .501 .249 


250 .499 251 
and 
.350 .500 .250 
Р” = |.250 .500 .250 
2250 .500 .250 


Thus this switching process should be very close to its limit in 6 or 7 
months. o 


12.4 Absorbing Markov Chains 
12.4.1 Another Gambler’s Ruin Example 


The gambler’s ruin process in Example 12.7 did not follow the patterns 
observed in Section 12.3, since it was not a regular process. It was not 
possible to get from any state to any other, since it was impossible to 
leave an absorbing state. However, the gambler’s ruin process had a 
long-term pattern of another kind. In the next example we will look at a 
simpler gambler’s ruin problem (with three coins instead of four) to 
illustrate the basic properties of absorbing Markov chains. 
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Example 12.10 Two gamblers start with a total of 3 coins 
between them. As before, they bet on coin tosses until one player has all 
the coins. In this case, the table of states and probabilities is as follows: 


ed Ending state 
Beginning state 


The transition matrix is 


оо л ~ 
© {л © © 
о о © 
— (л © © 


This chain is called an absorbing Markov chain because it is possible to 
go from any state to an absorbing state. If we take powers of the matrix P, 
we will see a long-term pattern develop. For example, the TI-83 calculator 
gives the result (with rounding to 3 places) 


1.000 .000 .000  .000 
.667 .000 .000 333 
.333 .000 .000 .667 
000 .000 .000 1.000 


p2? = 


This seems to imply the intuitive results that one player will eventually 
win all the coins, and the player with 2 out of 3 coins will win all the 
coins with a probability of 2/3. o 


12.4.2 Probabilities of Absorption 


The statement that one player will eventually win all the coins in this 
process is equivalent to the statement that the probability of the 
absorbing chain eventually reaching an absorbing state is 1. We will not 
prove this, but it is true. 
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The probability that an absorbing Markov chain will eventually 
reach an absorbing state is 1. 


The major task is to find the exact probability of eventually ending 
up in each absorbing state. In order to do this, it helps to rewrite the table 
for the process with the absorbing states first. For the three-coin 
gambler’s ruin, the table changes to the following table. 


Uu Ending state 
Beginning state 


Now the transition matrix is written differently. The reader must remem- 
ber that the order of states has changed. 


P= 


uoo 
л о о о 
Ф (л о о 


1 
0 
3 
0 


This matrix can be partitioned into four distinct parts in a natural way. 


The matrix in the upper left corner is denoted by I; it shows that the 
probability of staying in each absorbing state is 1 and the probability of 
leaving is 0. The matrix in the lower left corner is denoted by R; it gives 
the probabilities of going in one step from each non-absorbing state to 
each absorbing state. If we use the transition probability notation, 


ро різ 5 0 
R = = z 
p | | | 
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The matrix in the lower right corner is denoted by Q; it shows the one- 
step probabilities of moving between the non-absorbing states. 


2) Pir ро|_ 9.55 
Q B | B | 
When the transition matrix is arranged this way it is said to be in 
standard form. We could write this schematically as 


We will use the matrices introduced above to solve for the proba- 
bilities of ending up in each absorbing state. One absorption probability 
we need to find is 


aij = the probability of eventually being absorbed in the absorbing 
State j, from a start in the non-absorbing State i. 


In this problem, there are four such unknown probabilities: aio, азо, a13, 
and азз. We can write four equations in these four unknowns by setting 


up some basic probability relationships. The first unknown is 


ало = the probability of eventually being absorbed in 
State 0, from a start in the non-absorbing State 1. 


There are three ways to start in State 1 and eventually be absorbed in 
State 0. They are given below with their probabilities. 


P(move from State 1 to State 0 in one step) = pio 


P(move from State 1 to State 1 in one step and eventually reach State 0) 
= pua 


P(move from State 1 to State 2 in one step and eventually reach State 0) 
= P12420 


The desired ajo is the sum of these three probabilities. 


ало = Pio + P1410 + 12020 = .5 + Vaio + .5азо 
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We can reason similarly to obtain three more linear equations. 


азо = pao + рало + 22420 = 0 + -Sayo + 0а 
ауу = pis + piai + pi2a23 = 0 + бау; + .5аз 


az = pa + раз + puan = .5 5843 + 0053 


We now have a system of four equations in four unknowns which 
can be solved for the absorption probabilities. The matrix notation 
introduced in this section can make this task considerably easier. The 
four simultaneous equations are equivalent to the single matrix equation 


ало азу _ {рю Юз B: pui Pi2\) G10 13 

ax 23 Do pn Du polla az] 
If we write A for the unknown matrix of absorption probabilities, this 
matrix equation is 


А = К + QA. 


We can then solve this equation for A. 


A—QA-R 
(I-Q)A=R 
A-(-Q)'R 


For our three-coin gambler's ruin problem, the values of the necessary 
matrices are 


and 
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Then 


a-9r- 


WIN ca £s 
WI Who 


We find that the matrix of absorption probabilities is 


Hele 


The top row of the matrix A shows that ajọ = 2 and ауу = I. A gam- 


А = (1— 9)-!В = 


ЧМ Goff 
WIDE Wid 
Ud] ШӘМ 
АМ Gol 


bler with one coin will end up with no coins with probability 2, and all 


three coins with probability I, as predicted. The second row of the 


matrix can be interpreted similarly. Another item of interest is the 
expected number of times a gambler will be in each non-absorbing state 
if he starts in a particular non-absorbing state. 


n; = the expected number of visits (before absorption) to 
non-absorbing State 7, from a start in the non-absorbing State i. 


In the three-coin gambler's ruin problem, we would like to find the 
entries in the matrix 


Ni mj 
N- an 
n» N2 


It can also be shown that 
N-(I-Q)' 


Thus in the three-coin gambler's ruin problem, 


№= (1-9) = 


UAIN Go| 
ч WIN 
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For a gambler with one coin, the expected number of visits to State 1 is 
4/3 (including a count of 1 for the start in State 1 and an expected value 
of 1/3 subsequent visits before absorption), and the expected number of 
visits to State 2 before absorption is 2/3. The game will end fairly soon. 

We have examined these matrix results for a simple gambler's ruin 
chain, but the same reasoning can be used to show that they apply to any 
absorbing finite Markov chain. 


Absorbing Finite Markov Chains 


The transition matrix can always be written in the form 


The matrix of absorption probabilities is given by 
А = (I- Q)!R. 
The entries of the matrix 


(I-Q)'=N 


give the expected number of visits to non-absorbing State 7 from a 
start in non-absorbing State 7. 


In the next example, we will apply this theory to the gambler’s 
ruin problem tn which the two gamblers have a total of four coins. 


Example 12.11 The four-coin process has standard form matrix 


"c 

I 
Фол о ~ 
л © о ~ о 
Фл о о е 
л Ф л © © 
Філ оо о 
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and 
0.5 0 
Q=;5 0 5 
0 5 0 
We then calculate the following: 
1 —5 0 
1-0) = |-.5 1 —.5 
0 -5 1 
ES up 29 
N«(I-Q)'-|1!1 2 1 
5 1 15 
1.5 1 .5 5 0 „ТО! 25 
A-(I-Q)!R-NR- |1 2 | 0 0|=]|.50 30 
3S. ol 15 0 5 25...75 


These absorption probabilities are those we suspected on the basis of our 
matrix power calculations. For example, a gambler who starts with one 
coin has a .75 probability of absorption in State 0 (losing all his coins) 
and a .25 probability of absorption in State 4 (winning all four coins.) O 


12.5 Further Study of Stochastic Processes 


The material in this chapter was included to show the reader that theory 
can be developed to study the long-term behavior of stochastic proces- 
ses. Much further study and additional coursework is needed to learn the 
wide range of additional theory that can be used in financial risk 
management. For example, the reader who has had a course in the theory 
of interest can get a nice introduction to the stochastic theory of interest 
rates by reading Chapter 6 of Broverman [3]. Hopefully the end of this 
text has served only as a beginning. 
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12.6 Exercises 


12.1 


Simulation Examples 


For Exercises 12-1 through 12-3, use the following sequence of random 


numbers. 
1. .57230 6. .82496 11. .02480 16. .78322 
2. .85472 7. 52184 12. .99954 17. .00067 
3. .37282 8. .49837 13. .81708 18. .24844 
4. .77133 9. .76729 14. .90535 19. .14118 
5. .20525 10..50986 15. .76227 20. .47417 
12-1. For the two gamblers in Example 12.1, suppose A has 3 coins 


12-2. 


12-3. 


and B has 5 coins, and the game is played as described in the 
example. Use the random numbers given above to simulate the 
game. Which player would win the game, and how many coin 
tosses were needed to decide the winner? 


For an employee in the pension plan in Example 12.2, the 
probabilities for staying in a fund or switching funds are given in 
the following table. 


Er UM р, Г 
Start in 1 

ИШНЕН КЕЛЕН 
23 


Use the decision-making process for switching funds described 
in the example and the random numbers given above to simulate 
the progress of an employee who is initially in Fund 0. How 
many times in the next 20 months would he switch to, or stay in, 
Fund 1? 


Suppose the waiting time in months between accidents at an 
intersection is exponential with А = 3. Use the method in 
Example 12.3 and the random numbers given above to simulate 
the time between accidents. How many accidents occur in each 
of the first three months at this intersection? 
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12.2 


12-4, 


12-5. 


12-6. 


12-7. 
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Finite Markov Chains 


For members in a pension plan, the transition matrix of probabil- 
ities of switching funds is 


If the initial probability distribution is p© = [.50, .50], find 
(a) p”; (b) pO. 


The transition matrix for a Markov process with 2 states is 


and the initial probability distribution is p = [.40, .60]. Find 


(a) р); (b) p. 


The transition matrix for a Markov process with 3 states is 


P= 


— NE 


2 .4 
29» 23. [s 
3 6 


and the initial probability distribution is p = [.30, .30, .40]. 
Find pP. 


A mutual fund investor has the choice of a stock fund (Fund 0), 
a bond fund (Fund 1), and a money market fund (Fund 2). At the 
end of each quarter she can move her money from fund to fund. 
The probability that she stays in Fund 0 is .60, in Fund 1, .50, 
and in Fund 2, .40. If she switches funds, she will move to each 
of the other funds with equal probability. If she starts with all of 
her money in the stock fund, what is the probability distribution 
after two quarters? 
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12.3 


12-8. 


12-9. 


12-10. 


12-11. 


12-12. 


12.4 


12-13. 


12-14. 


Regular Markov Processes 


For the transition matrix in Exercise 12-4, find the limiting dis- 
tribution. 


What is the limiting distribution for the Markov process in 
Exercise 12-5? 


What is the limiting distribution for the Markov process in 
Exercise 12-6? 


What is the limiting distribution for the investor in Exercise 
12-7? 


Prove that if P is the transition matrix of a regular finite Markov 
process and £ is its limiting distribution, then £P — £. Hint: 
Write £ P” = (£P^-!)P and take the limit of both sides. 


Absorbing Markov Chains 


In the gambler's ruin example, suppose the game is rigged so 
that the probability that A wins is 1/3 and the probability that B 
wins is 2/3. Let the states represent the number of coins that A 
has at any time, and let the total number of coins between both 
players be 3. 

(a) Find the matrix N. 

(b) Ета the matrix A. 

(c) If A starts with 2 coins, what is the probability that he will 

lose (end in State 0)? 


Let the gamblers in Exercise 12-13 start with 4 coins between 

them. 

(a) Find the matrix N. 

(b) Find the matrix A. 

(c) If A starts with 2 coins, what is the probability that he will 
lose? 


Appendix A 


Values of the Cumulative Distribution Function for the 
Standard Normal Random Variable Z 


Second Decimai Place in 


0.00 0.01 002 — 0.03 004 005 0.06 0.07 0.08 0.09 
0.5000 0.5040 0.5160 0.5199 0.5279 0.5319 0.5359 
| 0.5398 0.5438 0.5557 0.5596 0.5675 0.5714 0.5753 
| 0.5793 0.5832 0.5948 0.5987 0.6064 0.6103 0.6141 
| 0.6179 0.6217 0.6331 0.6368 0.6443 0.6480 0.6517 
[0.6554 0.6591 0.6700 0.6736 0.6808 0.6844 0.6879 
| 0.6915 0.6950 0.7054 0.7088 0.7157 0.7190 0.7224 
| 0.7257 0.7291 0.7389 0.7422 0.7486 0.7517 0.7549 
| 0.7580 0.7611 0.7704 0.7734 0.7794 0.7823 0.7852 
| 0.7881 0.7910 0.7995 0.8023 0.8078 0.8106 0.8133 
| 0.8159 0.8186 0.8264 0.8289 0.8340 0.8365 0.8389 
| 0.8413 0.8438 0.8508 0.8531 0.8577 0.8599 0.8621 
| 0.8643 0.8665 0.8729 0.8749 0.8790 0.8810 0.8830 
| 0.8849 0.8869 0.8925 0.8944 0.8980 0.8997 0.9015 
| 0.9032 0.9049 0.9099 0.9115 0.9147 0.9162 0.9177 
| 0.9192 0.9207 0.9251 0.9265 0.9292 0.9306 0.9319 
| 0.9332 0.9345 0.9382 0.9394 0.9418 0.9429 0.9441 
| 0.9452 0.9463 0.9495 0.9505 0.9525 0.9535 0.9545 
| 0.9554 0.9564 0.9591 0.9599 0.9616 0.9625 0.9633 
| 0.9641 0.9649 0.9671 0.9678 0.9693 0.9699 0.9706 
| 0.9713 0.9719 0.9738 0.9744 0.9756 0.9761 0.9767 
| 0.9772 0.9778 0.9793 0.9798 0.9808 0.9812 0.9817 
| 0.9821 0.9826 0.9838 0.9842 0.9850 0.9854 0.9857 
| 0.9861 0.9864 0.9875 0.9878 0.9884 0.9887 0.9890 
| 0.9893 0.9896 0.9904 0.9906 0.9911 0.9913 0.9916 
| 0.9918 0.9920 0.9927 0.9929 0.9932 0.9934 0.9936 
| 0.9938 0.9940 0.9945 0.9946 0.9949 0.9951 0.9952 
| 0.9953 0.9955 0.9959 0.9960 0.9962 0.9963 0.9964 
| 0.9965 0.9966 0.9969 0.9970 0.9972 0.9973 0.9974 
[0.9974 0.9975 0.9977 0.9978 0.9979 0.9980 0.9981 
| 0.9981 0.9982 0.9984 0.9984 0.9985 0.9986 0.9986 
| 0.9987 0.9987 0.9988 0.9989 0.9989 0.9990 0.9990 
| 0.9990 0.9991 0.9992 0.9992 0.9992 0.9993 0.9903 

0.9993 0.9994 0.9994 0.9995 0.9995 0.9995 
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Second Decimal Place in z 
0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 
-3.2 0.0005 0.0006 0.0006 0.0007 0.0007 
-3.1 | 0.0007 0.0008 0.0008 0.0009 0.0009 0.0010 
-3,0 | 0.0010 0.0011 0.0011 0.0012 0.0013 0.0013 
-2.9 | 0.0014 0.0015 0.0016 0.0017 0.0018 0.0019 
-2.8 | 0.0019 0.0021 0.0022 0.0023 0.0025 0.0026 
-2.7 | 0.0026 0.0028 0.0030 0.0032 0.0034 0.0035 
-2.6 | 0.0036 0.0038 0.0040 0.0043 0.0045 0.0047 
-2.5 | 0.0048 0.0051 0.0054 0.0057 0.0060 0.0062 
-2.4 | 0.0064 0.0068 0.0071 0.0075 0.0080 0.0082 
-2.3 | 0.0084 0.0089 0.0094 0.0099 0.0104 0.0107 
-2.2 | 0.0110 0.0116 0.0122 0.0129 0.0136 0.0139 
-2.1 | 0.0143 0.0150 0.0158 0.0166 0.0174 0.0179 
-2.0 | 0.0183 0.0192 0.0202 0.0212 0.0222 0.0228 
-1.9 | 0.0233 0.0244 0.0256 0.0268 0.0281 0.0287 
-1.8 | 0.0294 0.0307 0.0322 0.0336 0.0351 0.0359 
-1.7 | 0.0367 0.0384 0.0401 0.0418 0.0436 0.0446 
-1.6 | 0.0455 0.0475 0.0495 0.0516 0.0537 0.0548 
-1.5 | 0.0559 0.0582 0.0606 0.0630 0.0655 0.0668 
-L4 | 0.0681 0.0708 0.0735 0.0764 0.0793 0.0808 
-1.3 | 0.0823 0.0853 0.0885 0.0918 0.0951 0.0968 
-1.2 | 0.0985 0.1020 0.1056 0.1093 0.1131 0.1151 
-1.1 | 0.1170 0.1210 0.1251 0.1292 0.1335 0.1357 
-1.0 | 0.1379 0.1423 0.1469 0.1515 0.1562 0.1587 
-0.9 | 0.1611 0.1660 0.1711 0.1762 0.1814 0.1841 
-0.8 | 0.1867 0.1922 0.1977 0.2033 0.2090 02119 
-0.7 | 0.2148 0.2206 0.2266 0.2327 0.2389 0.2420 
-0.6 | 0.2451 0.2514 0.2578 0.2643 0.2709 0.2743 
-0.5 | 0.2776 0.2843 0.2912 0.2981 0.3050 0.3085 
-0.4 | 0.3121 0.3192 0.3264 0.3336 0.3409 0.3446 
-0.3 | 0.3483 0.3557 0.3632 0.3707 0.3783 0.3821 
-0.2 | 0.3859 0.3936 0.4013 0.4090 0.4168 0.4207 
-0.1 | 0.4247 0.4325 0.4404 0.4483 0.4562 0.4602 
0.0 0.4721 0.4801 0.4880 0.4960 0.5000 
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Answers to the 
Exercises 


CHAPTER 2 
21. KH, QH, JH, KD, QD, JD 


2-2. (а) S = {z|x > 0 and z rational} 
(b) E = {x| 1,000 < x < 1,000,000 and z rational} 


2-3. (а) S= {1,2,3,...,25} (b) E {1,3,5,...,25} 

2-4. (1, 1),(1, 2),(1, 3),(1, 4),(1, 5),(1, 6),(2, 1),(2, 2),(2, 3),(2, 4).(2, 5),(2, 6). 
(3, 1),(3, 2),(3, 3),(3, 4),(3, 5),(3, 6),(4, 1),(4, 2),(4, 3),(4, 4),(4, 5),(4, 6). 
(5, 1),(5, 2),(5, 3),(5, 4),(5, 5),(5, 6),(6, 1),(6, 2),(6, 3),(6, 4),(6, 5),(6, 6) 

2-5. (а) 6 (b) 5 (c) 2 (d) 8 

2-6. ВВВ, BBG, BGB, BGG, GBB, GBG, GGB,GGG 

2-7. ~E = (2,4,6,...,24) 

28. KC, QC, JC 


2-9. AU B = {2|1,000 < x < 500,000 and z rational}, 
AN B = {2|50,000 < x < 100,000 апа z rational} 


2-10. (H, 3), (Н, 4), (Н, 5), (Н, 6) 
2-11. BUF 


= {(,5),(2, 4),(3, 3),(4, 2),(5, 1),(1, 1),(2, 2),(4, 4),(5, 5),(6, 6)} 
ENF = {(3,3)} 


Answers to the Exercises 


E = (GGG, GGB, GBG, GBB}, F = {GBG, GBB, BBG, BBB}, 

EU F = (GGG, GGB, GBG, GBB, BBG, BBB}, 

EN F = {GBG, GBB} 

(a) “You are not taking either a mathematics course or an 
economics course” is equivalent to “you are not taking a 
mathematics course апа you are not taking an econo- 
mics course.” 

(b) “You are not taking both a mathematics course and an 
economics course” is equivalent to “you are either not 
taking a mathematics course or you are not taking an 
economics course.” 

46 

92 

25 

92 

61 


(a) 11 (b) 17 (c) 44 (d) 50 


208 

1296; 360 
8,000,000; 483,840 
3,991,680 


5040 


Answers to the Exercises 407 


2-31. 24,360 
2-32. 17,280 

2-33. 10,080 

2-34. 4,060 

2-35. 2,598,960 

2-36. (a) 1,287 (b) 5,148 (c) 144 
2-37. 1,756,755 

2-38. 146,107,962 

2-39. 34,650 

2-40. 27,720 

2-41. 1,680 

2-42. 280 

2-43. 16s! — 32831 + 24872? — 851° + t 
2-44. —48,384 

2-47. 880 

CHAPTER 3 

3-1. 38 

3-2. 7/8 

3-3. (а) 3/16 (b) 9/16 

3-4. 47/68 = .6912 

3-5. (а) 1/6 (b) 1/18 (c) 1/6 


408 


Answers to the Exercises 


17/78 zz .2179 

(а) 1/30 (b) 1/2 (е) 1/5 
(а) .5729 (Ы) .0651 
.0079 

‚6271 

‚6501 

31/33 хх .9394 

.0014 

.0475 

(а) 1:5 (b) 17:1 


b 
a+b 


459 
54 


(а) .721 (b) .183 


(a) .0588 (b) .5588 (c) .3824 


3/7 


Answers to the Exercises 409 


3-28. 


3-29. 


1/2 

‚0859 

(A, C) 

Dependent 

(a) .63 (b) .33 
‚8574 

‚2696 

Мо 

(a) 29%  (b).2759 
5/9 

(а) .7059 (b) 7273 
‚1905 

(a).5581 (Б) .0175 
1/4 

.6087 

2442 

.05 

.60 

.256 


.48 


410 Answers to the Exercises 


3-50. .52 
3-5]. .33 
3-52. .40 
3-53. 2/5 
3-54. .173 
3-55. 4 

3-56. 467 
3-57. 1/2 
3-58. .53 
3-59. .657 
3-60. .0141 
3-61. .2922 
3-62. .21955 
3-63. .40 
3-64. .42 
CHAPTER 4 


41, DNusberofhe )[ V [3 | 2 [3] 


42. р(х) = 110 z=0,1,...,9 


4-3. p(x) = (16)(5/6)° x 


= 0, 
F(z) = 1 — (5/6)**! = = 


125227 
0; 1,2,... 


Answers to the Exercises 411 


4-4. 


x | pz) | F(z) 
2 | 1/36] 1/36 
3 | 1/18} 1/12 
4 |1412| 1/6 

5 | 1/9 | 5/18 
6 | 5/36 | 5/12 
7 | V6 | 7A2 
8 | 5/36 | 13/18 
9 | 19 | 5/6 

10 | 1/12 | 11/12 
11 | 1/18 | 35/36 

| 12 1/36 | 1 


267/108 ғ 2.47 

$114; $114 

$1190 

5 

Modes are 1 and 2 
210/36 zz 5.8333 
3,427.84 

(a).75 (b).9444 

и = 276; о = .53587 


T = 3.64; s = 1.9667 


412 Answers to the Exercises 
CHAPTER 5 

5-1. (a) 0.2461 (b) 0.05469 

5-2. (a) 0.2907 (b) 0.5155 

5-3. 0.00217 

5-4. (а) 0.1858 (b) pp = 20; о? = 19.6 
5-5. — Lossof$14 

5-6. (a) .0898 (b) .8670 

5-7. 5,000; 4,500 

5-8. (а) .1754 (b) 2581 (с) .8416 
5-9. | .9945 

5-11. 2/9 ғ .2222 

5-12. .3709 

5-13. (а) .2448 (Ы) 3 

5-14. (a) 8.1 (Ы) 3.199 

5-15. 3.25, 1.864 

5-16. (а) .3293 (b) .1219 

5-17. (а) .2231 (b) .3347 (c) 2510 
5-18. 1,900 

5-19. (a) .244 (b) .9747 (c) 244 
5-20. (a).0719 (Ы) .8913 

5-23. .0372 

5-24. (а) .0791 (b) .0374 


Answers to the Exercises 413 


5-25. BAYS 12; V(X)= 156 

5-26. (а) .0783 (Ы) .0347 

5-27. (а) .0751 (Ы) 15 

5-28. (а) .0404 (Ы) 24 (20 failures and 4 successes) 
5-29. E(X) = 25; V(X) = 150 

5-30. .0375 

5-31. 40 (32 failures and 8 successes) 

5-32. (а) .0437 (b)34 

5-34. p = $13,000; с = $7,211.10 

5-36. .92452 

5-37. 469 

5-38. .0955 

5-39 2 

5-40 7,231 

5-41. .04 

CHAPTER 6 

6-1. (a) 250 (b) 0.6 (c) 1.06 

6-2. 5.8333 

63.  E[u(W)) = 8289; E[u(W)} = 8.926 

69. (b E(X)=(n+1)/2; V(X) = (п? — 1)/12 


414 Answers to the Exercises 


6-10. Mx(t) = .42 + .30e' + .17е2 + Me; 
E(X) = .97; E(X?) = 1.97 


6-12. e%(4+ 6e”)? 
6-13. Negative binomial with p = .7 and r = 5 


6-14. 1,4, 15, 2, 13, 0, 11, 14, 9, 12, 7, 10, 5, 8, 3 


6-15. 7 
6-16. 2,3,2,2 
6-17. 698.9 


618. 12 + е! 
СНАРТЕК 7 


7-l. (b) F(X) = 0 for < 0, .75x? + 25x for 0 € x < 1, and 1 for 
z>1 ()P(0c X < 1/2) =.3125; P(VA € X < 3/4) = .50 


7-2.  (a)6 (b).6936 

7-3. 35 

7-4.  (a)2/m (Ы) 1/2 

7-5. 4343; 2/3; .8471 

7-6. (а) .4055; .6419 (b)In2 
7-1. 20; 4940 

7-8. .625; .0677 

79. 0.3 


7-12. .46875 


Answers to the Exercises 415 


7-13. 93.06 

7-14. 1/2 

7-15. 28/15 

7-16. 1/9 

7-17. .57813 
CHAPTER 8 

8-2. 50; 833.33 

8-3. 1/6 

8-4. (a) 3/10 (b) 1/12 

8-5. (а) 42.5; 18.75 (b) 44 minutes 
8-7. (a) 70; 300 (b).7167 
8-8. (а).3818 (b).1455 
8-9. (а).4512 (b).1653 
8-10. 1.2 

8-11. (а).5654 (b).1889 
8-12. 115 

8-13. (а) .4821 (b) .4541 
8-15. (a).0111 (Ы) .2063 
8-16. 1.9179; 9.2420 

8-19. (а).1535 (Ы) .3679 


416 


8-22. 


8-23. 


8-24. 


8-25. 


8-26. 


8-27. 


8-28. 


8-33. 


Answers to the Exercises 


1.20; .48 

1.50; .1875 

a= 12; B = 2/3 

(a) 1 — e (3z--1) (b).9988 (с) .1818 

3270 

(a) .8155 (b) .4238 (c) .6826 (d) .0990 

(a) 0.93 (b) —1.90 (c) —1.35 (d) 0.97 (e) 1.645 (f) 1.96 

1—a; 2a— | 

.8272 (Table), .82689 (TI-83) 

.9793 (Table), .97939 (TI-83) 

(a) .9270 (Table), .92698 (TI-83) 

(b) .9711 (Using Table answer in binomial probability), 
.97104 (using TI-83 answer) 

(a) .7881 (Table), .78815(TI-83) 

(b) .4895 (Using Table answer in binomial probability), 
.48957 (using TI-83 answer) 

.5244 (Table), .524304 (TI-83) 

3.5 

Е(Ү) = 160.77; V(Y) = 4,484.96 

.6684 (Table), .6691 (TI-83) 


‚1335 (Table), .1330 (TI-83) 


e^ 


Answers to the Exercises 417 


8-40. 


(a) .2776(Table), .276668(TI-83) 
(b) .1788(Table), .178096(TI-83) 


и = 7.7498; о = 3853 

(a) 5.6 (b) 5.9733 (c) 4.8761 (d) .22054 
(a) 700 (b) 200 (c) 93,333.33 

(a) (1/2)х!? (b) (3/4)? (c) (15/8)л!/2 
(a) .2007 (b) .1666 

10.522 

(а) .4737 (b).0613 (с) .66389 

31523(1 — 2)!2/32 

105 


.60; .04 


3125 


47178 
42045 
‚1915 
10,256 
4348 
173.3 
‚123 


‚8185 


418 Answers to the Exercises 
8-63. .1587 

8-64. .9887 

8-65. .7698 

8-66. 6,342,547.5 

CHAPTER 9 

9-1. 740.82 

9-2. 575.52 

9-3. — E[u(W)] = 2.3009; E[u(W3;)] = 2.2574 
9-4. (е? — ey[t(b — a)] ift Z 0, 1ift — 0. 
9-5. (b+ay2 

9-6. Qe! —2t - 2yt£ ift 40. lift - 0 
9-7. 1/3 

9-8. Gamma with a = 5 and 8 = 2 

9-9,  e*(3/(3 — 2t)] 

9-10. E(X)=1; V(X)z2 

9-1. E(X?) = и? + о? 

9-12. (a) Iny (b) l/y (both оп [1, е]) 

9-13. 1— e7, {огу 0 

9-14. (а) у? (Ы) 32°, ог0<у< 1 

9-15. .80 

9-17. 2, 4,8,6 


Answers to the Exercises 


9-18. 


9-19. 


9-20. 


9-21. 


9, 6, 2, 3 


F(0) = .90 
F(x) = .90 + .092/1000, for 0 < = < 1000 
F(z) = .99 + .01(x—1000)/9000, for 1000 < x < 10,000 


100 

F(0) = .90 

F(x) = .90 + 10[1 — (200/(2--200))3], for z > 0 
1 

ete чыз) 

(1 4 zy TES 


100— x 
100 ‚0< х < 100 


500 
5644.30 
2+ 3е72/ 
1.9 

1.7067 
403.436 


998.72 


е 


419 


420 Answers to the Exercises 


9-36. 25|In (тууу) – -04] 


9-37. .125e- (19)? (.1y)25 


CHAPTER 10 


[xw |2z9[W3[49] 


EET RCM ИГИ ИЛИ TT 
Ls таз pues |105 pas 


15/45 ЕШ 
0 3/45 


pz) KA 25/45 | 1045| — | 


10-1. 


10-2. 


10-3. E(X) = 20/9; E(Y)- 5/3 
10-4. E(X)=1; E(Y)=23/5 

10-5. V(X) = 4/9; V(Y) = 28/75 

10-6. 15/64 

102. (a) 12+z2, 0<=2<1 (b) 12+у,0<у<1 


10-8. (a) 223 + (3/2)z?,0<2<1 
(b 2/3 + 3y — (2/3)y3 — 3y, O< y <1 


Answers to the Exercises 421 


10-9. 


10-10. 
10-11. 
10-12. 
10-13. 
10-14. 
10-15. 


10-16. 


10-17. 


10-18. 
10-19. 
10-20. 
10-21. 


10-22. 


10-23. 
10-24. 
10-25. 
10-26. 
10-27. 


10-28. 


(a) 29/32 (b) 41/96 

72 

1/2 

E(X) = 31/40; Е(У) = 9/20 

1/125 

(а) (35 — 2х)/150,0 € x < 5 (b) (55 — 2y)/750,0 € y < 25 


E(X) = 85/36; E(Y) = 325/36 


лыс. 


pali) | 2/9 | 1/3 | 479 


12-43, Оа 

(2x? + 3у)/(22° -(3/22), O< y Ez x 1 

(a) 4/5 + (24/5)у, 0€ y < V2 (b) 3/10 

(ay 335,09 «1 (b)2xzh^,0e«mr«cy«l (с) 2у/3 
(d) 1/3 

Independent 

Dependent 

Independent 

Dependent 

20% 


.0488 
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10-29. 


10-30. 


10-31. 


10-32. 


10.33. 
10-34. . 
10-35. . 
10-36. 
10-37. 
10-38. 
10-39. 
10-40. . 
10-41. 
10-42. 
10-43. 
10-44. 
10-45. 
10-46. 
10-47. 
10.48. 


10-49. 


Answers to the Exercises 


.625 


Al 


0.5 pl 1 0.5 
» f(st)dsat+ | f(s,t) ds dt 
o J05 0 Jo 


6 30 p50-r 
"38.006 o f (50 — х ~ y)dy dz 
; 20 J20 


2/5 


488 

1553/2(1 — pl?) 
7/8 

172 

5.78 

.833 


45474 


Answers to the Exercises 423 


CHAPTER 11 


11-1. 


11-2. 


ЖИЕ IE ИЕ: 
p(s) 2/27 | 7/27 | 10/27 | 8/27 


11/18 


6(e7 7? — e73) 

95833 

Fs(s) = 1 — e^*(14-) 

(а—/2—1?/2)? 

E(X + Y) = 35/9 = 20/9 + 15/9 = E(X) + E(Y) 


E(X+Y)= 8/9; E(X) = E(Y) = 4/9 


. (а) 5/27 (b) 16/81. (c) –1/81 

. (a) 13/162 (b) 13/162 (c) 11/81 

. 68/81 

. (a) L5 (b) L6 (c) 25 (d) 24 (e) —05 (f) 39 
. (a) 1/20 (b) 3/80 (c) 2/5 (d) 11/80 

. —2041 


. 2774 


| 
— 


a) fx) 1+ т, = div. 


fx(2): fru) # f(x,y) 
(b E(X) = E(Y) = E(XY) = Co(X,Y) = 0 


424 Answers to the Exercises 


11-20. (2e7! + 7e*! + 6e%)/15 

11-21. [(e?* — 1)?/(4t?)] 

11-22. E(S) = т(7/2); V(S) = n(35/12) 
11-23. 14/81 

11-24. —25/81 

11-25. .8198 (Table), .82029 (TI-83) 
11-26. (a) 7.5 (b) 5.5 (c) 1 

11-27. 5 

11-28. (a) 18.75 (b) 2475 (с) 9 
11-29. 20.4 

11-30. 4.6 

11-31. 56,364 

11-32. (a) 6z(1— zx), fo0« x « 1 (b) 1/2 (c) 1/20 
11-33. 1/2 

11-34. у2/18 

11-35. 1/30 

11-36. 1/60 

11-37. (a) 5000 (b) 1,666,666.67 
11-38. (a) 3215.48 (b) 606,665.15 
11-39. .9898(Table), .98993(TI-83) 


11-40. 322,434.81 


Answers to the Exercises 


11-41. 1/64 
1 5/2 
11-42. irf rcos 7x(1+sinrx) dz 
3 


11-43. 2025 
11-44. .71 
11-45. .295 


11-46. 5.72 
11-47. — 
11-48. .414 
11-49. —— ——— for x > 0 


11-50. .8413 
11-51. 11 
11-52. 200 
11-53. 0 
11-54. .041 
11-55. 6 
11-56. 5,000 
11-57. 10,560 
11-58. 19,300 


11-59. 8.80 
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11-60. .2743 

11-61. 16 

11-62 .03139 

11-63. 328 

CHAPTER 12 

12-1. A would win in 13 tosses 
12-2. 13 

12:3, 2,3,3 

12-4. (a) [.45, .55] (b) [.43, .57] 
12-5. (a) [.504, .496] (b) [.54144, .45856] 
12-6. [.22, .33, 45] 

12-7. [.47, 28, .25] 

12-8. [5/12, 7/12] 

12-9. (9/16, 7/16] 

12-10. [11/57, 20/57, 26/57] 


12-11. [15/37, 12/37, 10/37] 


9/7 3/7 67 17 
od @ [67 A b) p a А 


7/5 3/5 1/5 14/15 1/15 
12-14. (а) | 6/5 9/5 3/5 | (Ы) | 4/5 1/5 | (е) 4/5 


4/5 6/5 7/5 8/15 7/15 


[1] 


[2] 


[3] 


[4] 


[5] 


[7] 


[8] 


[9] 
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